[open-bibliography] FRBR, relationships and inference
pohl at hbz-nrw.de
Thu May 27 23:13:50 BST 2010
I just tried to read through the tons of very interesting mails you guys wrote over the last week and would now like to add to it by giving my two cents about the FRBR model and the question of which approach to take to get rich data which is easy to produce and is a fruitful basis for research. I will step back a bit first to make my point in the end.
As it seems, nobody disagrees with producing Open Bibliographic Data as Linked Data, i.e. data complying with the RDF model. Which is good, because RDF models very elegant the basic operation we do when we make a statement about something: reference and predication. We identify something and say something about it. This is the stuff librarians and other data producers have done for centuries: identifying a resource and saying something about it. And RDF uses unambigous public names (URIs) for entities and predicates which accounts for its virtue over past approaches.
I have to make clear: I'd like to have URIs for works and we need URIs for works because they are important for literary scholars and the like. But I think we'll never agree on _one_ model for defining works. Because there is no obvious way to distinguish a work entitiy. We'd have to prescribe the way of identifying works which won't work. We need another approach.
So let's just identify which we can easily agree on as one entity, let's take care of (id)entities we can easily distinguish. These entities are items and manifestations, the entities we catalogued/described all along:
Individual books (items) are easily distinguished from each other and as such identified easily as long as we are in the print domain. And manifestations are quite easily identified as well, because they can be defined as a set of texts which share the sameness of spelling.
Note: In the digital domain we can't catalog items because they only exist transiently on screens, it only makes sense to describe manifestations. (And I would argue that a digitalization of a book and the corresponding print-manifestation are instantiations of the same manifestation. I don't understand why we make to records for these - at least in Germany. Following this approach, media-type would be an attribute for items... )
So we got items and manifestations and items are linked to manifestations and inherit some of their properties (as it is the case in actual catalogs). What's nect to do is: Propose and use relations between Items, Manifestations and the entitites in FRBR groups 2 and 3 (which make sense in my opinion and I think so far nobody objected to them).
I think what is really important here are relations between manifestations like: consequentEdition and previousEdition, revisedEdition, abridgedEdition, illustratedEdition, translation, filmAdaptation, radioPlayAdaptation, parody, plagiarism, commentary, hommage and so on. If we got this data, for example we could infer the original edition and the language of the original work, we wouldn't have to add it to a work description ourselves. And the best: Other people could infer work entities by grouping stuff together as they find useful. If somebody wants the movie to be part of the work: OK. If others want a translation to be a new work: Why not? It's the Semantic Web, isn't it designed to do this stuff?
This approach doesn't exclude grouping works from the start as long as I record the relevant attributes and relationships from which other people could infer other work groupings.
I hope I didn't just repeat what was already said. What do you think about it?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-bibliography