Assigning Library of Congress Classification codes to books based only on their titles

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Many publishers follow the Library of Congress Classification (LCC) scheme to indicate a classification code on the first pages of their books. This is useful for many libraries worldwide because it makes possible to search and retrieve books by content type, and this scheme has become a de facto standard. However, not every book has been pre-classified by the publisher; in particular, in many universities, new dissertations have to be classified manually. Although there are many systems available for automatic text classification, all of them use extensive information which is not always available, such as the index, abstract, or even the whole content of the work. In this work, we present our experiments on supervised classification ofbooks by using only their title, which would allow massive automatic indexing. We propose a new text comparison measure, which mixes two well-known text classification techniques: the Lesk voting scheme and the Term Frequency (TF). In addition, we experiment with different weighing as well as logical-combinatorial methods such as ALVOT in order to determine the contribution of the title in the correct classification. We found this contribution to be approximately one third, as we correctly classified 36% (on average by each branch) of 122, 431 previously unseen titles (in total) upon training with 489,726 samples (in total) of one major branch (Q) of the LCC catalogue.

Original languageEnglish
Pages (from-to)77-84
Number of pages8
JournalInformatica (Ljubljana)
Volume34
Issue number1
StatePublished - 2010

Keywords

  • LCC
  • Library classification
  • Logical-combinatorial methods
  • Scarce information classification

Fingerprint

Dive into the research topics of 'Assigning Library of Congress Classification codes to books based only on their titles'. Together they form a unique fingerprint.

Cite this