[open-bibliography] Place of Publication data from the BL dataset
kcoyle at kcoyle.net
Sat Nov 27 13:59:23 GMT 2010
Quoting Tom Morris <tfmorris at gmail.com>:
> That's assuming that the place of publication is stored on a
> per-publisher basis, not a per-book basis in the British Library
> database. That type of knowledge about the schema used in the
> internal database will be key to making informed decisions about the
> data. Where is that schema documented?
Assuming that BL data follows the MARC format, the schema is "documented" at:
All of the fields that begin with "00" (zero zero) are coded data with
fixed value vocabularies.
If you look at this page, along the left-hand side you will see a
heading for MARC Code lists, with some lists, including geographic
If you want to grab the lists of MARC fields, subfields and codes
lists, this page (about 2/3 of the way down) has information about and
links to two csv files that contain the MARC data elements:
There is not an official machine-usable schema for the format, unfortunately.
>> The conversion currently takes the place of publication, distribution,
>> etc. from the 260$a. We're considering including the 008/15-17 in future
> What does that mean in English?
:-) that they would include the 2-3 letter georgraphic code from field
008 positions 15-17. Those are the codes listed at:
The Library of Congress, the owner of these vocabularies, has not
provided an RDF expression for the country codes. There is one,
It's "unofficial" but there are no other RDF options that I know of.
> Is there a listing available someplace of what fields in the dump came
> from free form text fields vs database records which guarantee that
> anything linked to that record always has the same text value? For
> example, book titles and edition statements are almost certainly
> free-form text fields, but I'd expect authors to have individual
> records where every book linked to the same record to have the same
> author name in the dump.
> Is there a comprehensive list of which fields are free-form vs
> database record backed? Knowing the internal schema would be very
> helpful in making use of the data.
> On Fri, Nov 26, 2010 at 2:45 AM, Ben O'Steen <bosteen at gmail.com> wrote:
>> (And as Karen has just pointed out, the reason why I am exploring this field
>> is to aid disambiguation of publishers. Having created the overview that I
>> know I need, I thought to share it here.)
> That makes sense, although I'd have thought that publisher data is
> noisy enough and low value enough that it'd be pretty far down on the
> priority list to clean up.
> More interesting I think is whether these text strings represent one
> author or three or ...:
> Wilson, Angus, 1913-1991
> Wilson, Angus, 1913-1991,
> Wilson, Angus, 1913-1991.
> Wilson, Angus.
> Willson, Angus.
> My gut tells me that at least the first three text strings probably
> represent a single author, but that's not what the database seems to
> p.s. Can some librarian type tell me what the trailing period (full
> stop) means? It's not used consistently, but it appears much, MUCH
> more frequently in library data than anywhere else I've seen.
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
kcoyle at kcoyle.net http://kcoyle.net
More information about the open-bibliography