TY - GEN
T1 - Automatic term extraction using log-likelihood based comparison with general reference corpus
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
AU - Lavin-Villa, Eduardo
AU - Chanona-Hernandez, Liliana
PY - 2010
Y1 - 2010
N2 - In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.
AB - In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.
KW - Single-word term extraction
KW - log-likelihood
KW - reference corpus
KW - term clustering
UR - http://www.scopus.com/inward/record.url?scp=77955446164&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13881-2_26
DO - 10.1007/978-3-642-13881-2_26
M3 - Contribución a la conferencia
SN - 3642138802
SN - 9783642138805
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 248
EP - 255
BT - Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings
T2 - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010
Y2 - 23 June 2010 through 25 June 2010
ER -