Heuristics-based replenishment of collocation databases

Igor A. Bolshakov, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

9 Scopus citations

Abstract

Collocations are defmed as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBD5) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically "similar" to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.

Original languageEnglish
Title of host publicationLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Pages25-32
Number of pages8
Volume2389
DOIs
StatePublished - 2002
Externally publishedYes

Fingerprint

Dive into the research topics of 'Heuristics-based replenishment of collocation databases'. Together they form a unique fingerprint.

Cite this