[okfn-discuss] freetable.org: Expressions of interest sought
gordonipub2 at gordoni.com
Tue Jan 12 05:00:52 GMT 2010
Thanks for your detailed and thoughtful reply.
I think it is worth separating out centralization/decentralization of
organizations versus what it is they build/create. The two are
reasonably independent. I will limit myself to just what is created.
FreeTable takes a centralized approach to data storage. This is a
point of expediency. A decentralized approach has the advantage of
preventing a single point of control. Weighing against this though
there are difficult problems in dealing with data integrity,
preventing data loss (what happened to the climate data from 1902?),
ensuring system responsiveness, and preventing spam. Most of these
are research problems. A centralized approach on the other hand can
be created relatively easily using well known techniques.
You are right about the importance of having well defined goals. Let
me try an nail a few things down...
A database in the cloud: FreeTable envisions working with datasets
too large and too dynamic to be downloaded and then queried. Instead
the query is sent to FreeTable and FreeTable returns the result.
A programmatic interface: FreeTable seeks to provide a programmatic
interface to data, allowing programmers to create a user interface to
the data, and the next Ebay, Facebook, or Craigslist. (Based on
feedback it seems this goal may be inadequate and FreeTable may need
to provide a nice user interface for entering data, but this interface
won't be able to compete with the domain specific interfaces others
Develper community: FreeTable's community will be software
developers seeking to share data. (Again based on feedback this
community might have to be expanded to people with data to share).
All but the largest datasets: The notion of what is data spans a huge
range from small classified ad listings to large genomic datasets.
FreeTable would like to focus towards the lower end of this range.
The limits of FreeTable are probably datasets less than 1 Gbyte in
size or receiving less than 1000 simple queries per second. This
probably covers a majority of datasets.
The Open Database License looks like a good step forward. I have a
concern though in the context of FreeTable. I don't think it is
strong enough. Suppose FreeTable hosts a database of classified ads.
A site that displays classified ads could use the FreeTable database
and also contribute the ads they receive from users back to FreeTable.
Another site though could use the FreeTable database, but keep any ads
contributed to themselves. To the user the second site is always
better since it has more ads, and the first site is left with little
incentive to contribute ads back to FreeTable since it only helps
their competition. In the end the public commons withers away. I
would like to see a license that says if you use this dataset, then
you must contribute back any similar data you gather. I don't know
how to word that legally.
More information about the okfn-discuss