[open-bibliography] Discussing the openbiblio principles
adrian.pohl at okfn.org
Mon Nov 1 10:52:09 GMT 2010
On October 15th we published a draft for "Principles on Open
Bibliographic Data" on openbiblio.net, asking for comments. I am
myself not yet satisfied with the principles and will point out some
basic problems which - in my opinion - have to be resolved before
publishing the principles in a first "stable" version. But first, I
will lay out the current structure and content of the principles.
== The principle's structure ==
The principles are divided into four parts with part three being quite
short. These are the parts:
Part 1: Some kind of preamble.
Part 2: A definition of bibliographic data as used in the principles.
In short it says: A bibliographic record is an aggregation of
statements about a bibliographic resource which sufficiently
identifies (and optionally addresses) a bibliographic resource. A
bibliographic resource can be an article, a monographs, a series etc.
Also, an enumeration of most bibliographic data is attempted.
Part 3: A statement that neither a bibliographic record nor any of its
previously defined parts meet the threshold of originality and are
thus automatically in the public domain. It is further noticed that
there might only be rights (e.g. the European sui generis database
right on collections of bibliographic data.
Part 4: A set of five principles for dealing with bibliographic data
which build on each other:
1. The first principle makes clear that in publishing bibliographic
data - whether single records or collections - you best make a license
statement about which usage of the data you want to allow.
2. The second principle makes clear that software licenses or content
licenses like most Creative Commons licenses aren't appropriate for
data and thus shouldn't be used.
3. The third principle asks for using _open_ licenses and points out
that particularly non commercial licenses aren't open.
4. The fourth principle asks to choose a _specific_ open license for
bibliographic data: a public domain license like CC0 or PDDL.
5. The fifth and last principle now lists parts of bibliographic data
which haven't been mentioned in the second (definition) part because
they might fall under copyright: abstracts, keywords, reviews.
Creators of bibliographic data are asked to also license this -
potentially copyrighted - bibliographic data with an open license or
dedicate it to the public domain.
== Proposals for improvement ==
Here are my observations and proposals for improvement. The most
fundamental and important problem is outlined in the beginning (1.).
1. I was quite confused by Peter's post on the principles. Peter writes:
"Note that I am NOT talking about COLLECTIONS of bibliographic records
(which may be copyrightable in some jurisdictions) but individual
records which are uncopyrightable."
Though it might be good for didactical reasons to start with the
public domain status of single bibliographic records I think we SHOULD
explicitely and first and foremost talk about COLLECTIONS of data. At
least, I am talking about collections of bibliographic records and the
principles draft - like the Panton Principles - explicitely refers
to collections in at least two places.
I think that not much would be won if publishers put a public domain
mark or something similar on single bibliographic records. Even now
nobody objects if you take a bibliographic record e.g. for reference
management with Zotero, many websites even support it through
microformats etc. Thus, single bibliographic records are already
treated as part of the public domain. But if you crawl a website and
take a substantial part of all bibliographic records you might get
into legal problems - and the principles wouldn't try to change this
if they only adressed single bibliographic records. The underlying
legal fact is that you can have copyright or database right on a
collection of data whose parts themselves are explicitely in the
public domain or - in other words: If all parts of a collection are
for themselves in the public domain it doesn't necessarily follow that
you are legally allowed to take, reuse and pass on the whole
collection or substantial parts of it.
2. As the first point seems to me the most pressing problem we should
try to solve I also have some other observations. A related basic
structural problem with the principle's _definition_ of bibliographic
data in part two is that it doesn't cover some parts of bibliographic
data for legal reasons: Abstracts, keywords and reviews aren't
contained in the definition because they might be copyrightable. I
think this should be fixed a definition of bibliographic data should
not leave out parts of bibliographic data because of their legal
status. I propose the distinction between potentially copyrightable
and non-copyrightable bibliographic data to be made in the definition
part (part 2) and thus to include possibly copyrighted parts of
bibliographic data in this definition.
3. A problem with the definition of bibliographic resources might be -
although I think it is no problem to be vague here - that it leaves
unclear whether DVDs, computer games, paintings, records, sculptures
etc. are covered.
4. Another problem I have with the draft principles from a library
perspective: authority files and authority records are included but in
the background although they may form the most important and reusable
parts of library data. (I think this is at least the case for name
5. A small problem we can fix the easiest: The principles sometimes
talk about "bibliographic metadata" and sometimes about "bibliographic
data". This should be harmonized and at one point - if "bibliographic
metadata" is chosen - made clear that metadata legally falls under
the concept of data. Laura James adressed this problem in a comment on
I would be very happy to get these questions answered and to resolve
the discrepancies so that we can eventually publish a sensible and
consistent principle text.
More information about the open-bibliography