Re: Persistent databases of a different sort...

I am at a point now where my daily workload consists of accessing a half-dozen different formats of CSV files (comma-separated text as from Excel)... Accessing individual columns or isolated fields is becoming a bit tedious, as I have to remember the peculiarities of each kind of CSV file. It occurred to me that some sort of data base that could hold metadata for each file would be useful, allowing a more uniform approach to the parsing of these CSV files.

For example, one type of CSV file is stored in reverse chronological format with dates specified as YYYY-MM-DD, while another type applies only to today, and uses time-stamps that are are granular at the 1-minute level, while another data field, called Volume, is a strictly monotonic increasing function of time, but also in reverse chronological order. And yet a third uses chronological ordering by day, but with 2 lines of header information, and dates stored as DD-MM-YY.

Persistent databases such as Elephant, Rucksack, PLOB, and others seem to be addressing a different sort of problem - namely the persistent storage of Lisp objects -- almost universally without support for saving / reloading functions and closures. My data changes daily, in the form of these CSV files, and so persistent objects holding the information would not be very useful. But a database that held metadata and some closures for reading these different formats would be very useful.

I ran across a paper at MIT detailing something called DSPACE, from 1975, wherein the author was doing pretty much what I think I need -- namely storing bits of code along with data in a database, so that a uniform high-level access protocol could be used. I haven't seen anything else along these lines. It also seems to me that this approach would only be usable in a purely interpreted environment.

http://dspace.mit.edu/bitstream/1721.1/6233/2/AIM-332.pdf

Does anyone have any experience to share in this regard?

David McClain

Chief Technical Officer

Refined Audiometrics Laboratory

4391 N. Camino Ferreo

Tucson, AZ 85750

email: dbm@refined-audiometrics.com

phone: 1.520.390.3995

web: http://www.refined-audiometrics.com

Skype: dbmcclain

Re: Persistent databases of a different sort...

Hi David,

It sounds as if you have an interesting problem to solve... One "solution" that probably completely ignores your requirements >smile< would be converting the CVS to RDF and then using a triple-store to manage it. Thanks for the link to the paper, though it seems to talk about DABA, not DSpace -- FWIW, I thought that DSpace was related (idea wise) to Linda, and tuple-spaces, Java Spaces and, perhaps, GBB... and so not relevant to the problem you're trying to solve).

On Jun 25, 2007, at 8:29 PM, David McClain wrote:

I am at a point now where my daily workload consists of accessing a half-dozen different formats of CSV files (comma-separated text as from Excel)... Accessing individual columns or isolated fields is becoming a bit tedious, as I have to remember the peculiarities of each kind of CSV file. It occurred to me that some sort of data base that could hold metadata for each file would be useful, allowing a more uniform approach to the parsing of these CSV files.

For example, one type of CSV file is stored in reverse chronological format with dates specified as YYYY-MM-DD, while another type applies only to today, and uses time-stamps that are are granular at the 1-minute level, while another data field, called Volume, is a strictly monotonic increasing function of time, but also in reverse chronological order. And yet a third uses chronological ordering by day, but with 2 lines of header information, and dates stored as DD-MM-YY.

Persistent databases such as Elephant, Rucksack, PLOB, and others seem to be addressing a different sort of problem - namely the persistent storage of Lisp objects -- almost universally without support for saving / reloading functions and closures. My data changes daily, in the form of these CSV files, and so persistent objects holding the information would not be very useful. But a database that held metadata and some closures for reading these different formats would be very useful.

I ran across a paper at MIT detailing something called DSPACE, from 1975, wherein the author was doing pretty much what I think I need -- namely storing bits of code along with data in a database, so that a uniform high-level access protocol could be used. I haven't seen anything else along these lines. It also seems to me that this approach would only be usable in a purely interpreted environment.

http://dspace.mit.edu/bitstream/1721.1/6233/2/AIM-332.pdf

Does anyone have any experience to share in this regard?

David McClain
Chief Technical Officer
Refined Audiometrics Laboratory
4391 N. Camino Ferreo
Tucson, AZ 85750

email: dbm@refined-audiometrics.com
phone: 1.520.390.3995
web: http://www.refined-audiometrics.com
Skype: dbmcclain

Gary Warren King, metabang.com

Cell: (413) 885 9127

Fax: (206) 338-4052

gwkkwg on Skype * garethsan on AIM