[odc-discuss] Data, Facts, Databases and Licensing (was: Re: Open Database Licence)

Rufus Pollock rufus.pollock at okfn.org
Thu Mar 5 14:47:06 GMT 2009

2009/3/4 Jonathan Rochkind <rochkind at jhu.edu>:
> Oops, no I did not mean for that to be off list. I'll forward it, and this
> reply to the list.
> I'll read your background info, thanks.
> One note of concern is your emphasis on the difference between data and
> databases. I understand the difference in law (in some jurisdictions), but
> with the kinds of data I deal with in the library world, the distinction in
> law is hard to carry forward in practice. Most of our databases consist of

Right but that's true of pretty much everything and not specific to
data. When does a piece of software code get big enough to get a
copyright? When, if I copy the structure of your code but not the
exact text do I infringe etc etc.

> multiple subsets of data that were themselves harvested from other databases
> -- and will then themselves have subsets of their data extracted into yet
> other databases.  And these transactions aren't neccesarily even one way --
> database B could get some data from database A, that data could be
> changed/enhanced/improved while in database B, and then database A could
> later get it back again.

I think there is some confusion. When I say 'data' here I mean the
data qua fact itself. E.g. the fact that water boils at a 100 degrees
set out as:

Substance: Boiling Point
Water        : 100

would be considered data. No-one however can claim a monopoly right in
this! However when such material is aggregated together it may have
such a monopoly right and therefore need licensing.

This is similar to the fact that one does not have a monopoly right in
each word or even each paragraph of text in this email but as a whole
you might be able to claim copyright ...

> This makes a distinction between 'data' and 'databases' pretty unworkable in
> practice in  my opinion, even if it exists in the law.

It depends by what you mean by unworkable. The simple answer if your
risk averse is to assume that any collection of data you are getting
from elsewhere that goes beyond the barest 'fact' is a database (and
not just 'data') and you'll need to sort out rights (either by having
a license or whatever). Let's also be clear: providing open licenses
does not create rights: they exist already (which is why you should be
applying a suitable dedication/license be it the PDDL or something

> Now, if there are some databases that are more coherent wholes, I guess I
> can understand wanting to take advantage of that to make use of the rights
> given (in some jurisdictions; not so much in the US as I understand it) to
> 'databases'.  But I'd caution people to be careful with that, I'm not sure
> it's a good idea. What seems to you to be a coherent whole 'database' is
> quite possibly destined for slice-and-dice remixing in the contemporary
> internet world, it's what the internet wants to do with your data. You could
> try to license _against_ this, saying your 'database' needs to be used en
> toto, and not combined with other data -- assuming you can actually cobble
> together a legal right to enforce such a license restriction (which in the
> US is going to be difficult). But I think you'd seriously limit the utility
> of your data by doing that.   Personally, I think that applies to the open
> street map stuff, but I respect that the open street map community thinks
> different.

Wherever you are you are going to have to apply a license (or
dedication/license like the PDDL) if you want there to be clarity.
Please see the links I sent in the earlier email especially:


> But, okay, if some people think they can get away with enforcing licensing
> restrictions based on the 'database', and there is reasonable legal counsel
> that this is indeed something that can be enforced, then I suppose it makes
> sense to provide for those needs.  But I think it's just asking for trouble,
> and likely to cause trouble down the line -- just on the grounds of your
> legal rights accross jurisdictions to enforce such restrictions.  And legalwhich I
> confusion can seriously inhibit the willingness of people to use the stuff.

I think the crucial point here is to distinguish between 2 kinds of

1. Uncertainty for licensor
2. Uncertainty for user

For user, even with any doubts over exactly where a database right
begins, it is easy to get certainty: comply with the license! So I
don't really see the issue. For the licensor matters may be slightly
different and it is important to emphasize that in certain
jurisdictions, such as the US, there may situations when SA clauses
are not enforceable (e.g. you're making available a telephone
directory!). In such situation you could try going for a clickwrap
approach -- but this still has major problems and I would strongly
recommend against it on the grounds of being so burdensome).

> Rufus suggests that no major player is going to try and violate a license
> even if it's legal enforceability is unclear. (I'm not sure that's so -- it
> depends on how valuable the data and it's intended possibly-violating use
> is).  I'd suggest there's another side of that coin though:  Especially when
> it comes to slicing and dicing and mixing together data from various
> sources, no major player is going to be _willing_ to use data if it's right
> to do so is unclear.

Then we are agreed but again I don't see why data is e.g. different
from code here (there's a reason why it is said that Microsoft won't
go a mile within a GPL'd project). Again I'd be interested in your
comments once you've read those links especially:


Plus appendices:




More information about the odc-discuss mailing list