Automatic term extraction using log-likelihood based comparison with general reference corpus

Alexander Gelbukh, Grigori Sidorov, Eduardo Lavin-Villa, Liliana Chanona-Hernandez

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

37 Scopus citations

Abstract

In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Proceedings
Pages248-255
Number of pages8
DOIs
StatePublished - 2010
Event15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010 - Cardiff, United Kingdom
Duration: 23 Jun 201025 Jun 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6177 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010
Country/TerritoryUnited Kingdom
CityCardiff
Period23/06/1025/06/10

Keywords

  • Single-word term extraction
  • log-likelihood
  • reference corpus
  • term clustering

Fingerprint

Dive into the research topics of 'Automatic term extraction using log-likelihood based comparison with general reference corpus'. Together they form a unique fingerprint.

Cite this