TY - GEN
T1 - Selection of Representative Documents for Clusters in a Document Collection
AU - Gelbukh, Alexander
AU - Alexandrov, Mikhail
AU - Bourek, Ales
AU - Makagonov, Pavel
N1 - Publisher Copyright:
© 2003 Gesellschaft fur Informatik (GI). All rights reserved.
PY - 2003
Y1 - 2003
N2 - An efficient way to explore a large document collection (e.g., the search results returned by a search engine) is to subdivide it into clusters of relatively similar documents, to get a general view of the collection and select its parts of particular interest. A way of presenting the clusters to the user is selection of a document in each cluster. For different purposes this can be done in different ways. We consider three cases: selection of the average, the “most typical,” and the “least typical” document. The algorithms are given, which rely on a dictionary of keywords reflecting the topic of the user's interest. After clustering, we select a document in each cluster basing on its closeness to the other ones. Different distance measures are discussed; preliminary experimental results are presented. Our approach was implemented in the new version of Document Classifier system.
AB - An efficient way to explore a large document collection (e.g., the search results returned by a search engine) is to subdivide it into clusters of relatively similar documents, to get a general view of the collection and select its parts of particular interest. A way of presenting the clusters to the user is selection of a document in each cluster. For different purposes this can be done in different ways. We consider three cases: selection of the average, the “most typical,” and the “least typical” document. The algorithms are given, which rely on a dictionary of keywords reflecting the topic of the user's interest. After clustering, we select a document in each cluster basing on its closeness to the other ones. Different distance measures are discussed; preliminary experimental results are presented. Our approach was implemented in the new version of Document Classifier system.
UR - http://www.scopus.com/inward/record.url?scp=84971482262&partnerID=8YFLogxK
M3 - Contribución a la conferencia
AN - SCOPUS:84971482262
T3 - Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
SP - 120
EP - 126
BT - Natural Language Processing and Information Systems, 8th International Conference on Applications of Natural Language to Information Systems, NLDB 2003
A2 - Dusterhoft, Antje
A2 - Thalheim, Bernhard
PB - Gesellschaft fur Informatik (GI)
T2 - 8th International Conference on Applications of Natural Language to Information Systems, NLDB 2003
Y2 - 23 June 2003 through 25 June 2003
ER -