[open-bibliography] Provenance of library metadata records
owen at ostephens.com
Tue Aug 31 09:46:19 BST 2010
I've move this to a new thread for clarity. I think it would be useful to
try and just explore this a bit more.
It seems to me there are two possibilities going forward - either we can
identify where (at least some) records come from, or we can't. If the
latter, then a side effect is that there is no way to assert any rights over
the record (whether valid or not) (is this like an orphan work in some
If the former, then what tests can/should be applied to check provenance. I
can see two ways - either relying on the MARC standard to tell us where (if
people cared about it) provenance should be recorded, or by comparing two
records and trying to work out if they are similar enough to assert one is
derived from the other. I can't really see how the latter would work - which
way the derivation goes could be difficult, and in theory two correctly
formulated catalogue records could well end up being the same anyway even
with no relationship between the two.
PS the OCLC library list/codes has a machine interface (SRU I think) as well
as that web interface
On Fri, Aug 27, 2010 at 4:59 PM, Karen Coyle <kcoyle at kcoyle.net> wrote:
> Quoting Owen Stephens <owen at ostephens.com>:
>> I'm not at all convinced that simply containing the OCLC number means that
>> the record comes from OCLC, but agree it could be an indicator
> Owen, in fact anyone can add an OCLC number or delete one from a record --
> there is no protection against that. As part of the WorldCat Local process,
> libraries send records to OCLC that do not have OCLC numbers, and OCLC
> matches them to the WC database and adds the number. That record could have
> come from anywhere, but now has an OCLC number.
>> Can any cataloguing/MARC gurus confirm that in theory the 040 and 008/39
>> identify the source of cataloguing (as far as is possible)?
> I'm not by any means a guru in this area, but I think what we will run up
> against here is inconsistent use of the 040 field in different library
> systems. OCLC seems to always mark the source of updates by adding the
> library's code to the 040 field. I doubt if local systems do the same; that
> is, if a library has updated a record in their local system, my guess is
> that the 040 field is not necessarily updated. So should that record be
> exchanged directly with another library (or sent out to the cloud) that
> update is not registered.
> It seems to me that there are only two things that can be known in *some*
> cases: 1) where did the record originate? (040 $a, the library that did the
> original cataloging) 2) where did the record immediately come from? (the
> 003, the sending agency). That said, I know for a fact that many libraries
> do not property set the 001 (local system id) and 003 when they export
> records -- many send out records with the OCLC number in the 001, and their
> own system number in a 9xx field.
> What would be
>> really good is if we could start to compile a list of 'rights' or
>> across those organisiations that can appear in 040? (perhaps even persuade
>> LoC and other agencies this should be added to organisational records
>> have valid MARC21 Organisation codes in
>> http://www.loc.gov/marc/organizations/org-search.php and equivalent
> Once again, OCLC practices have overridden standards. OCLC puts the OCLC
> customer ID in the 040, not the standard MARC21 organization code. You can
> look up OCLC ids on their site, but I don't see a way to download the whole
> I think that the upshot is: great idea, but the data may not support it.
>> I realise that to be exhaustive would be a huge amount of work, but think
>> that to cover the main sources of records in a country wouldn't be too
>> This could also help inform choices about where libraries who want to
>> publish open data get their bib data from?
>> Does this have any legs or is it pointless/too much work?
>> Identifying WorldCat as the source of data that has been transferred or
>>> made available downstream of the initial extraction from WorldCat can
>>> sometimes be complex. A combination of the following data elements in a
>>> bibliographic record can help determine if the record was initially
>>> extracted from WorldCat:
>>> * An OCLC Control Number along with
>>> o the 001 field that includes value characters "ocm" or "ocn"
>>> o the 035 field that includes the value "(OCoLC)" and/or
>>> o the 994 field"
>>> I think all this must be read together. Even though your library is not,
>>> and has never been, an OCLC library, you may still be in possession of
>>> is defined here as "WorldCat Data" and therefore subject to this policy.
>>> This also clearly includes single records. Although I am not a lawyer,
>>> what I read here, it seems that once something has touched OCLC in any
>>> at all, and no matter what you have done with it, OCLC claims ownership
>>> (i.e. that it is WorldCat Data) and that it falls under this policy.
>>> How this deals with, e.g. a record created by the Library of Congress,
>>> perhaps even as CIP (i.e. public domain), then being downloaded and
>>> by another library, finally, I would take this record directly from e.g.
>>> Yale, through Z39.50 and update it myself, according to this, this record
>>> would still fall under OCLC's policy.
> Karen Coyle
> kcoyle at kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
Owen Stephens Consulting
Email: owen at ostephens.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-bibliography