Automatic term extraction using log-likelihood based comparison with general reference corpus

Alexander Gelbukh; Grigori Sidorov; Eduardo Lavin-Villa; Liliana Chanona-Hernandez

doi:10.1007/978-3-642-13881-2_26

Automatic term extraction using log-likelihood based comparison with general reference corpus

Alexander Gelbukh, Grigori Sidorov, Eduardo Lavin-Villa, Liliana Chanona-Hernandez

Centro de Investigación en Computación (CIC)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

37 Scopus citations

Abstract

In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

Original language	English
Title of host publication	Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings
Pages	248-255
Number of pages	8
DOIs	https://doi.org/10.1007/978-3-642-13881-2_26
State	Published - 2010
Event	15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010 - Cardiff, United Kingdom Duration: 23 Jun 2010 → 25 Jun 2010

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	6177 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010
Country/Territory	United Kingdom
City	Cardiff
Period	23/06/10 → 25/06/10

Keywords

Single-word term extraction
log-likelihood
reference corpus
term clustering

Access to Document

10.1007/978-3-642-13881-2_26

Cite this

Gelbukh, A., Sidorov, G., Lavin-Villa, E., & Chanona-Hernandez, L. (2010). Automatic term extraction using log-likelihood based comparison with general reference corpus. In Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings (pp. 248-255). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6177 LNCS). https://doi.org/10.1007/978-3-642-13881-2_26

Gelbukh, Alexander ; Sidorov, Grigori ; Lavin-Villa, Eduardo et al. / Automatic term extraction using log-likelihood based comparison with general reference corpus. Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings. 2010. pp. 248-255 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{02690a8103a849ba8e2f7a78e22dbc50,

title = "Automatic term extraction using log-likelihood based comparison with general reference corpus",

abstract = "In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.",

keywords = "Single-word term extraction, log-likelihood, reference corpus, term clustering",

author = "Alexander Gelbukh and Grigori Sidorov and Eduardo Lavin-Villa and Liliana Chanona-Hernandez",

year = "2010",

doi = "10.1007/978-3-642-13881-2_26",

language = "Ingl{\'e}s",

isbn = "3642138802",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "248--255",

booktitle = "Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings",

note = "15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010 ; Conference date: 23-06-2010 Through 25-06-2010",

}

Gelbukh, A , Sidorov, G, Lavin-Villa, E & Chanona-Hernandez, L 2010, Automatic term extraction using log-likelihood based comparison with general reference corpus. in Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6177 LNCS, pp. 248-255, 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Cardiff, United Kingdom, 23/06/10. https://doi.org/10.1007/978-3-642-13881-2_26

Automatic term extraction using log-likelihood based comparison with general reference corpus. / Gelbukh, Alexander ; Sidorov, Grigori; Lavin-Villa, Eduardo et al.
Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings. 2010. p. 248-255 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6177 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Automatic term extraction using log-likelihood based comparison with general reference corpus

AU - Gelbukh, Alexander

AU - Sidorov, Grigori

AU - Lavin-Villa, Eduardo

AU - Chanona-Hernandez, Liliana

PY - 2010

Y1 - 2010

N2 - In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

AB - In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

KW - Single-word term extraction

KW - log-likelihood

KW - reference corpus

KW - term clustering

UR - http://www.scopus.com/inward/record.url?scp=77955446164&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13881-2_26

DO - 10.1007/978-3-642-13881-2_26

M3 - Contribución a la conferencia

SN - 3642138802

SN - 9783642138805

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 248

EP - 255

BT - Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings

T2 - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010

Y2 - 23 June 2010 through 25 June 2010

ER -

Gelbukh A , Sidorov G, Lavin-Villa E, Chanona-Hernandez L. Automatic term extraction using log-likelihood based comparison with general reference corpus. In Natural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings. 2010. p. 248-255. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-13881-2_26

Automatic term extraction using log-likelihood based comparison with general reference corpus

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this