Inferences for enrichment of collocation databases by means of semantic relations

Alexander Gelbukh

doi:10.13053/CyS-22-1-2923

Inferences for enrichment of collocation databases by means of semantic relations

Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

A text consists of words that are syntactically linked and semantically combinable—like “political party,” “pay attention,” or “stone cold.” Such semantically plausible combinations of two content words, which we hereafter refer to as collocations, are important knowledge in many areas of computational linguistics. We present the structure of a lexical resource that provides such knowledge—a collocation database (CBD). Since such databases cannot be complete under any reasonable compilation procedure, we consider heuristic-based inference mechanisms that predict new plausible collocations based on the ones present in the CDB, with the help of a WordNet-like thesaurus: If an available collocation combines the entries A and B, and B is ‘similar’ to C, then A and C are supposed to constitute a collocation of the same category. Also, we describe the semantically induced morphological categories suiting for such inference, as well as the heuristics for filtering out wrong hypotheses. We discuss the experience in inferences obtained with CrossLexica CDB.

Original language	English
Pages (from-to)	103-117
Number of pages	15
Journal	Computacion y Sistemas
Volume	22
Issue number	1
DOIs	https://doi.org/10.13053/CyS-22-1-2923
State	Published - 2018

Keywords

Collocations
Enrichment
Hypernyms
Inference rules
Meronyms
Synonyms

Access to Document

10.13053/CyS-22-1-2923

Cite this

@article{0a7b8e2c71b64a31badd47f19637cf49,

title = "Inferences for enrichment of collocation databases by means of semantic relations",

abstract = "A text consists of words that are syntactically linked and semantically combinable—like “political party,” “pay attention,” or “stone cold.” Such semantically plausible combinations of two content words, which we hereafter refer to as collocations, are important knowledge in many areas of computational linguistics. We present the structure of a lexical resource that provides such knowledge—a collocation database (CBD). Since such databases cannot be complete under any reasonable compilation procedure, we consider heuristic-based inference mechanisms that predict new plausible collocations based on the ones present in the CDB, with the help of a WordNet-like thesaurus: If an available collocation combines the entries A and B, and B is {\textquoteleft}similar{\textquoteright} to C, then A and C are supposed to constitute a collocation of the same category. Also, we describe the semantically induced morphological categories suiting for such inference, as well as the heuristics for filtering out wrong hypotheses. We discuss the experience in inferences obtained with CrossLexica CDB.",

keywords = "Collocations, Enrichment, Hypernyms, Inference rules, Meronyms, Synonyms",

author = "Alexander Gelbukh",

year = "2018",

doi = "10.13053/CyS-22-1-2923",

language = "Ingl{\'e}s",

volume = "22",

pages = "103--117",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "1",

}

TY - JOUR

T1 - Inferences for enrichment of collocation databases by means of semantic relations

AU - Gelbukh, Alexander

PY - 2018

Y1 - 2018

N2 - A text consists of words that are syntactically linked and semantically combinable—like “political party,” “pay attention,” or “stone cold.” Such semantically plausible combinations of two content words, which we hereafter refer to as collocations, are important knowledge in many areas of computational linguistics. We present the structure of a lexical resource that provides such knowledge—a collocation database (CBD). Since such databases cannot be complete under any reasonable compilation procedure, we consider heuristic-based inference mechanisms that predict new plausible collocations based on the ones present in the CDB, with the help of a WordNet-like thesaurus: If an available collocation combines the entries A and B, and B is ‘similar’ to C, then A and C are supposed to constitute a collocation of the same category. Also, we describe the semantically induced morphological categories suiting for such inference, as well as the heuristics for filtering out wrong hypotheses. We discuss the experience in inferences obtained with CrossLexica CDB.

AB - A text consists of words that are syntactically linked and semantically combinable—like “political party,” “pay attention,” or “stone cold.” Such semantically plausible combinations of two content words, which we hereafter refer to as collocations, are important knowledge in many areas of computational linguistics. We present the structure of a lexical resource that provides such knowledge—a collocation database (CBD). Since such databases cannot be complete under any reasonable compilation procedure, we consider heuristic-based inference mechanisms that predict new plausible collocations based on the ones present in the CDB, with the help of a WordNet-like thesaurus: If an available collocation combines the entries A and B, and B is ‘similar’ to C, then A and C are supposed to constitute a collocation of the same category. Also, we describe the semantically induced morphological categories suiting for such inference, as well as the heuristics for filtering out wrong hypotheses. We discuss the experience in inferences obtained with CrossLexica CDB.

KW - Collocations

KW - Enrichment

KW - Hypernyms

KW - Inference rules

KW - Meronyms

KW - Synonyms

UR - http://www.scopus.com/inward/record.url?scp=85045945219&partnerID=8YFLogxK

U2 - 10.13053/CyS-22-1-2923

DO - 10.13053/CyS-22-1-2923

M3 - Artículo

SN - 1405-5546

VL - 22

SP - 103

EP - 117

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 1

ER -

Inferences for enrichment of collocation databases by means of semantic relations

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this