Peter Murray-Rust pm286 at cam.ac.uk
Thu Jun 24 18:32:18 BST 2010

This is very useful to me - I'd value more opinions

On Thu, Jun 24, 2010 at 5:57 PM, Tim Spalding <tim at librarything.com> wrote:

> A few comments.
> If I might restate, the problem is severe here in that the proposal
> uses both "bibliography" and "bibliographic. Unfortunately, the normal
> uses of these terms aren't adjective and noun pointers to the same
> concept.

I'm happy to be enlightened. From the BL I find:
Bibliographic Services

The British Library provides a range of services for people requiring
bibliographic information. These services include the national
bibliography<http://www.bl.uk/bibliographic/natbib.html>of the UK and
publication of the catalogues of the British Library. All
products and services are designed to be compliant with

 So here bibliographic services provide bibliographies and catalogues.

> I am, however, interested in *just what data is produced, or opened*
> in both extent and detail. If this project produces a linked-data
> representation of the BL's library records, with all MARC data
> preserved in some way—great.

I'll let Rufus and Ben (O'Steen) and Ben (White) BL, answer that. My
understanding is that we shall have access to the MARC records.

> There's a lot that can be done with that.
> If it adds data to that, from another source or through internal
> analysis, double great. If it produces some lossy representation,
> either in extent or detail, which can't be used for cataloging or to
> add information to traditional catalogs, that becomes a lot less
> interesting to me. I'm simply unclear which is happening here.

It depends completely on what is available. Here is the  HTML provided for a
typical paper in Acta Cryst. E by the editorial staff

<meta content="urn:issn:1600-5368" name="DC.source" />
  <meta content="http://creativecommons.org/licenses/by/2.0/uk"
name="DC.rights" />
  <meta content="Xue, L.-W." name="DC.creator" />
  <meta content="Li, X.-W." name="DC.creator" />
  <meta content="Zhao, G.-Q." name="DC.creator" />
  <meta content="Peng, Q.-L." name="DC.creator" />
  <meta content="2009-09-01" name="DC.date" />
  <meta content="doi:10.1107/S1600536809037520" name="DC.identifier" />
  <meta content="International Union of Crystallography" name="DC.publisher"
  <meta content="http://scripts.iucr.org/cgi-bin/paper?is2450"
name="DC.link" />
  <meta content="en" name="DC.language" />
  <meta content="text" name="DC.type" />
name="DC.title" />
  <meta content="In the title compound, [Cu(C13H9NO3)(C5H5N)], the CuII atom
is coordinated in a distorted square-pyramidal geometry, with two N and two
O atoms in the basal positions and one O atom in the apical position. The
apical Cu-O bond [2.3520 (16) A] is much longer than the basal Cu-O and Cu-N
bonds [1.9139 (14)-2.0136 (17) A]. The carboxylate group bridges CuII atoms,
forming a zigzag chain along the a axis." name="DCTERMS.abstract" />
  <meta content="10" name="prism.number" />
  <meta content="65" name="prism.volume" />
  <meta content="2009-09-01" name="prism.publicationDate" />
  <meta content="Acta Crystallographica Section E: Structure Reports Online"
name="prism.publicationName" />
  <meta content="1600-5368" name="prism.issn" />
  <meta content="metal-organic compounds" name="prism.section" />
  <meta content="1237" name="prism.startingPage" />
  <meta content="med at iucr.org" name="prism.rightsAgent" />
  <meta content="1237" name="prism.endingPage" />
  <meta content="1600-5368" name="prism.eissn" />
  <meta content="" name="keywords" />
  <meta content="NOARCHIVE,NOINDEX" name="ROBOTS" />

we can also add automatic metadata such as:
* the number of graphics in the article
* the number of HTML-normalized words in the article
* the number of tables in the article
* the unique ID of the co-editor of the article
* Received 20 May 2010
* Accepted 31 May 2010
* Online 5 June 2010

To me that looks like high quality bibliographic metadata. There is some
journal-generic stuff we can fold in such as:

format: "-//W3C//DTD XHTML 1.0 Frameset//EN"
encoding: UTF-8

and metadata about the publisher

 Address International Union of Crystallography, 5 Abbey Square, Chester CH1
2HU, England  Telephone 44 1244 342878  Fax 44 1244 314888  Managing
Editor Peter
Strickland (med at iucr.org)

 *Acta Crystallographica Section E* Gillian Holmes (gh at iucr.org)

There may, however, be metadata that we cannot extract automatically and
which would require the participation of the authors or publisher and we do
not intend to extract that

> > We have two very sets of use cases. Rufus and Ben will be working with
> key
> > libreary catalogues (BL and Cambridge). Here we can expect some records
> to
> > be very complex and the expectation of usage very complex and varied
> So, is BL data going to be opened up?

Some is.

> ...

> > "Data licensing" will depend on what the data are. The OKF has tools for
> all
> > sorts of "data".
> This is a two sided question. I'm interested in what OKF
> discovers/decides about the legal state of book-and-article records.

We have a lot of experience. We should be able to address questions such as
"is a catalog a document (CC-BY), a data set (PDDL/CC0) or a database
(OdBL). Or some mixture of these"

> And I'm interested in whether LT should release its data to OKF, and
> if we did, what licenses could be applied to it. For example, we
> offered to release our Common Knowledge data to Open Library, but they
> refused the CC-Attribution license we proposed.
> OKF generally holds metadata rather than primary data so it depends. Just
mail Rufus or Jonathan


Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
