[okfn-discuss] Re: [SPARC-OpenData] data sharing
rufus.pollock at okfn.org
Fri Dec 22 13:26:13 GMT 2006
John Wilbanks wrote:
> Hi all, chiming in here...just joined the list.
>Rufus Pollock wrote:
>> "There will of course be some adaptation needed but the basic principles
>> that must be addressed by an 'open' licence, be it for content or for
>> scientific data, are essentially the same:
>> 1. Freedom to use, reuse and redistribute
>> 2. attribution (or not)
>> 3. sharealike (or not)
>> Of course I agree that there are some aspects of the CC licences that
>> strike one as rather content oriented (e.g. talk of 'public
>> performance') but these would no way invalidate the licence. Furthermore
>> CC licences are already being used to licence various sets of geodata
>> and we at the OKF have been using them for datasets we've produced. I
>> also note that, for example, the Dutch creative commons team explicitly
>> wrote in provisions to deal with database rights in the Dutch CC licences."
> The lack of international consensus on data makes use of CC licenses for
> data problematic. The EU directive doesn't exist elsewhere, and is not
Yes, the IPR status for collections data varies greatly by jurisdiction.
Some have no protection at all, others have common-law copyright (e.g.
Australia, and US pace Feist), the EU has copyright + a sui generis
> written into the majority of even the EU licenses. This makes
Yes though that perhaps can be fixed via a suggestion to the various
local drafting teams ...
> interoperation along the CCi model (where I can upload a file in the US,
> and you can download it in Brazil) much harder to achieve, as the IPR
> does not exist in the US.
(a) I think a lot of the jurisdictions have some kind of IPR that can be
used. Furthermore these kinds of 'interoperability' issues already exist
with the CC licences for content. A lot of people in England will just
point to the standard US cc licences rather than the E&W licence even
though it is not precisely customized to the national law. What one
really wants is a clause in any national licence saying: when you
licence under this licence you licence under the equivalent licence in
all other jurisdictions. Such a provision would also work for CC
licences attached to data.
(b) CC licences aren't just legal documents they are also a way of
encoding the 'social contract'. Thus even if it turns out a licence is
not perfectly enforceable attaching it to work is providing a useful
signal to others of what the creator (or owner) wishes to permit.
Particularly in the academic community such 'intentions' will carry
strong weight since violation can be sanctioned in all kinds of
> SC is examining the idea that data be simply tagged as public domain,
But many may not want their data to be public domain. Look at gracenote
and freedb: the archetypal example of an appropriation of the commons.
Many people want a sharealike provision in their data licences. Of
course in jurisdictions where there is no underlying legal rights this
is meaningless but in many jurisdictions it will not be.
> data, though it would let CC licenses be used, could also result in the
> automatic assignment of copyrights to all data sets - which means that
> if sharing licenses were *not* attached we would likely see a vast space
> of orphan data with all rights reserved. It seems to be a feature of
> copyrighted content.
I am not sure I understand this. Either rights already exist in such
datasets (though they may not be exercised) or they do not. If they do
not then attaching CC licences won't suddenly create these rights. If
they do, then attaching CC licences has just made the situation clearer.
I do appreciate that in reality there is, of course, quite a lot of
greyness over what one is allowed to do and this can be beneficial
because it allows people informally to do stuff they might not be
formally allowed to (the classic case of this is provided by the data in
Walsh, Cohen and Cho who show that one reason patents in biotech have
not had much impact on researchers is that the patents are routinely
ignored out of ignorance, see ). That said, surely in the long run I
think it is better to be explicit (more discussion on a similar theme
can be found in ).
> definition allow the use, reuse, and distribution of data, without the
> need for a binding intellectual property license. In some cases, using
> intellectual property - which is a blunt instrument - can have dreadful
> unintended consequences.
It can, though as I just said I'm not sure how attaching a licence would
create such IPRs. Rather it might make people *aware* that such IPR
exists -- in which case we are back to the previous point.
> Also, our research is unclear as to what "attribution" and "share alike"
> mean in the context of data. What if I run a query across 10,000 gene
> expression data sets? If I access only one record per data set?
> Attribution and derivative works are terms built for copyrights, and the
> legal implications might mean you have to attribute 10,000 people every
> time you generate a data set. The normative values of each field of
> science work pretty well for this already...
These are hard questions but again if the IPR rights already exist these
are questions that will have to be faced whether there is a licence or
not. Furthermore, the courts have already been struggling (perhaps
rather unsuccessfully) here in the EU and elsewhere to define these
kinds of things. For example, the EU DB directive talks about the the
right 'to prevent extraction and/or reutilization of the whole or of a
substantial part, evaluated qualitatively and/or quantitatively, of the
contents of that database.' For more on this see, e.g.:
> Paul Uhlir made a very important point to me in person at the CODATA
> meetings in Beijing. A "commons" isn't just a place where "some rights"
> are reserved. It's a place where "some rights, or no rights" are
> reserved. Data may well fall into the latter category.
Absolutely though, as mentioned above, I would not underestimate the
attractiveness of 'share-alike' provisions. In my own experience so far
with licensing discussions there has been a strong support for these
kind of provisions -- and we should also note the prevalence of the GPL
in F/OSS community.
> However, as I said, we're examining the idea, and welcome the discussion.
> Now, if you have a database, we have created a FAQ for owners, and
> uniprot.org (the world's largest database of biological protein
> information) uses the CC license in the following manner:
> "We have chosen to apply the Creative Commons Attribution-NoDerivs
> License to all copyrightable parts of our databases. This means that you
> are free to copy, distribute, display and make commercial use of these
> databases, provided you give us credit. However, if you intend to
> distribute a modified version of one of our databases, you must ask us
> for permission first." (http://www.pir.uniprot.org/terms.shtml)
To my mind this would mean that the database was *not* open/free in the
sense of the open knowledge/data definition:
What is their motivation for doing this? I assume it is an integrity
concern, e.g. they don't want different version of the database floating
around the net all slightly different. However why couldn't this be
addressed by a standard provision of the PERL type: 'if you modify this
database you must *not* distribute it under the same name and must
clearly identify that is has been modified'
Open Knowledge Foundation
More information about the okfn-discuss