TY - GEN
T1 - A method of describing document contents through topic selection
AU - Gelbukh, A.
AU - Sidorov, G.
AU - Guzman-Arenas, A.
N1 - Publisher Copyright:
© 1999 IEEE.
PY - 1999
Y1 - 1999
N2 - Given a large hierarchical dictionary of concepts, the task of selection of the concepts that describe the contents of a given document is considered. The problem consists in proper handling of the top-level concepts in the hierarchy. As a representation of the document, a histogram of the topics with their respective contribution in the document is used. The contribution is determined by comparison of the document with the «ideal» document for each topic in the dictionary. The «ideal» document for a concept is one that contains only the keywords belonging to this concept, in proportion to their occurrences in the training corpus. A fast algorithm of comparison for some types of metrics is proposed. The application of the method in a system classifier is discussed.
AB - Given a large hierarchical dictionary of concepts, the task of selection of the concepts that describe the contents of a given document is considered. The problem consists in proper handling of the top-level concepts in the hierarchy. As a representation of the document, a histogram of the topics with their respective contribution in the document is used. The contribution is determined by comparison of the document with the «ideal» document for each topic in the dictionary. The «ideal» document for a concept is one that contains only the keywords belonging to this concept, in proportion to their occurrences in the training corpus. A fast algorithm of comparison for some types of metrics is proposed. The application of the method in a system classifier is discussed.
UR - http://www.scopus.com/inward/record.url?scp=84947761571&partnerID=8YFLogxK
U2 - 10.1109/SPIRE.1999.796580
DO - 10.1109/SPIRE.1999.796580
M3 - Contribución a la conferencia
AN - SCOPUS:84947761571
T3 - String Processing and Information Retrieval Symposium and International Workshop on Groupware, SPIRE 1999 and CRIWG 1999
SP - 73
EP - 80
BT - String Processing and Information Retrieval Symposium and International Workshop on Groupware, SPIRE 1999 and CRIWG 1999
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1999 String Processing and Information Retrieval Symposium and International Workshop on Groupware, SPIRE 1999 and CRIWG 1999
Y2 - 22 September 1999 through 24 September 1999
ER -