Heuristics-based replenishment of collocation databases

Igor A. Bolshakov, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

9 Scopus citations


Collocations are defmed as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBD5) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically "similar" to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.

Original languageEnglish
Title of host publicationLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Number of pages8
StatePublished - 2002
Externally publishedYes


Dive into the research topics of 'Heuristics-based replenishment of collocation databases'. Together they form a unique fingerprint.

Cite this