[open-bibliography] More verbs. Electronic 'Items' (Yes, another FRBR thread)

graham graham at theseamans.net
Mon Jul 12 17:29:27 BST 2010

On 07/12/10 16:02, Weinheimer Jim wrote:

> Also, with materials in sites such as the Internet Archive, how do we deal with them:
> http://www.archive.org/details/imitationofchris00newy
> Is this 8 manifestations? That's a lot of extra records and work. Is it best for the readers and is it worthwhile to deal with them this way? And what happens when the Archive decides to automatically create, let's say XML versions for every document. Are these all going to be different manifestations? 

As a non-librarian, I have a related problem: I'm taking machine-OCRed
text from Internet Archive sets like these and hand-proofing them. These
are then going onto a site which contains text from a range of sources,
much of it scanned from originals by volunteers. We may end up with
variant copies of the same texts, and depending on the quality of the
original ocr and care taken in subsequent proofreading, the copies can
have varying fidelity to the originals. I would like to have some way to
identify these texts as coming from a particular original (the internet
archive is mostly extremely good at maintaining the camera metadata for
its scans, so there is a solid starting point), but also to identify the
series of processes it has been through after that. It seems to me that
with the vary large number of copies of text versions of
out-of-copyright material there are now floating round the internet some
way of establishing this kind of chain - and the likelihood of fidelity
to the original of the particular copy - is going to get more important.
Were those rude jokes in your copy of Alice in Wonderland really put
there by Carroll? Did Mark Twain really have such strange spelling as
your copy of a Yankee at the Court of King Arthur suggests? etc. This
doesn't seem to be the kind of thing librarians in general are
interested in, but maybe there are relevant techniques from archivists
dealing with mediaeval hand-copied texts?  Although these are definitely
different 'manifestations' of a text in a loose sense, they don't seem
to fit in FRBR either.

Are there any existing ways to handle this kind of thing? None of my
ideas for creating hashes of data + organization name as unique version
identifiers seem to hold up :-( I really don't believe attempting to
catalogue all the variants is either possible or desireable, surely each
version needs to be self-describing, not listed in a single place.


More information about the open-bibliography mailing list