[open-bibliography] Proposed definition for /book/book

Tim Spalding tim at librarything.com
Fri Jul 2 07:47:57 BST 2010

> Even outside the constraints of catalog cards, it's often considered
> a good idea to leave out "Man-woman relationships" or other subject
> descriptors that cover a large portion of a collection, in order to
> prevent overload when people go looking for books under that heading.

The rule of three came about during the catalog card era, and was
maintained after, but your argument does hold up as far as it goes. In
doing so, however, I think we can see that classification is done for
a reason, and generally in the context of a specific--and rapidly
obsolete--use case. It is so very easy to slip into the notion that
classification is something more than that—something simply true and
appropriate to every context and use.

> I didn't take formal cataloger training, but my understand is that
> traditionally, you're only supposed to add a subject heading when it's
> both a primary and a distinctive characteristic of a work, as opposed
> to something that just happens to be featured in it at some level.
> The idea is that, when you look up the heading, you should find the books
> that are most centrally and definitively about man-women relationships
> in your collection, not all the books in your collection that happen
> to include men and women relating to each other somewhere in the text.
> You can see the usefulness of this kind of policy if you're specifically
> trying to do focused research on a previously defined topic.

Well, yes and no. You can only see that if you presume the term is
binary. Google has indexed 78 million pages with the word "CNN" in
them, but when I search for it, it knows not to give me everything, in
accession order, as most library systems would. It knows that
"CNN"-ness is a number, or even an array, not a boolean.

Similarly, something like 25,000 works on LibraryThing have been
tagged "chick lit," but the top ones are really, really good examples
of it.

For more of this sort of thinking, see my YouTube conference video,
"What's the big deal about tagging?" on YouTube

> "Show me the
> books that the most people have associated with man-woman relationships"
> tends to filter for popularity more than "show me the books that
> are *concerned* most with man-woman relationships".  ("Relationships" might
> not be the best example, because this topic is a bit diffuse to begin
> with; but the difference is starker when you have a topic that's
> featured briefly in many popular works, but deeply in only a few,
> more obscure works.)

There is something to what you say. Popularity will always be a
factor, but not so simply as you say. It is trivial to "factor out"
popularity--to ask not what books are most often chick lit overall,
but what books are most often tagged chick lit, norming for

In the case of tags, LibraryThing doesn't factor popularity out.* But
factoring out popularity is absolutely basic to all such algorithms.
If we didn't do it, the top recommendations for almost every book
would be something by J. K. Rowling because, statistically, those are
the books with the highest overlap, even for books of a completely
different kind. The problem therefore presents itself as a lack of
data. You can't put a book first in chick lit because a single person
said so and it is the only tag it received (thus being "100% about
chick lit!").

> One consequence of this is that if you're trying to compose together a large
> open collection from a bunch of smaller pre-existing collections, you're
> going to have to deal sensibly with the different purposes and contexts
> where subject headings have been assigned.  And, hopefully, try to come
> up with workable strategies and policies going forward.

True. All I really want is for people to understand such decisions are
about the purposes and contexts, and that these contexts are by no
means guaranteed to be useful in the future.


*The results weren't better or more interesting, they were just
confusing to people. The normed "salience" of a tag is, however,
visible in which tags are bold in a tag cloud.

