Various criteria of collocation cohesion in internet: Comparison of resolving power

Igor A. Bolshakov, Elena I. Bolshakova, Alexey P. Kotlyarov, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The 'noun-its own adjective' pairs are true collocations and their measure values form one distribution. The 'noun-alien adjective' pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found. © 2008 Springer-Verlag Berlin Heidelberg.
Original languageAmerican English
Title of host publicationVarious criteria of collocation cohesion in internet: Comparison of resolving power
Pages64-72
Number of pages56
ISBN (Electronic)354078134X, 9783540781349
DOIs
StatePublished - 27 Aug 2008
EventLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) -
Duration: 1 Jan 2014 → …

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4919 LNCS
ISSN (Print)0302-9743

Conference

ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Period1/01/14 → …

Fingerprint

Cohesion
Optical resolving power
Glossaries
Collocation
Scalability
Websites
Statistics
Internet
Mutual Information
Monotonic
Minimise
Necessary
Estimate

Cite this

Bolshakov, I. A., Bolshakova, E. I., Kotlyarov, A. P., & Gelbukh, A. (2008). Various criteria of collocation cohesion in internet: Comparison of resolving power. In Various criteria of collocation cohesion in internet: Comparison of resolving power (pp. 64-72). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4919 LNCS). https://doi.org/10.1007/978-3-540-78135-6_6
Bolshakov, Igor A. ; Bolshakova, Elena I. ; Kotlyarov, Alexey P. ; Gelbukh, Alexander. / Various criteria of collocation cohesion in internet: Comparison of resolving power. Various criteria of collocation cohesion in internet: Comparison of resolving power. 2008. pp. 64-72 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5e2187d4fcab4875bb451ec00d38b04d,
title = "Various criteria of collocation cohesion in internet: Comparison of resolving power",
abstract = "For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The 'noun-its own adjective' pairs are true collocations and their measure values form one distribution. The 'noun-alien adjective' pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found. {\circledC} 2008 Springer-Verlag Berlin Heidelberg.",
author = "Bolshakov, {Igor A.} and Bolshakova, {Elena I.} and Kotlyarov, {Alexey P.} and Alexander Gelbukh",
year = "2008",
month = "8",
day = "27",
doi = "10.1007/978-3-540-78135-6_6",
language = "American English",
isbn = "354078134X",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "64--72",
booktitle = "Various criteria of collocation cohesion in internet: Comparison of resolving power",

}

Bolshakov, IA, Bolshakova, EI, Kotlyarov, AP & Gelbukh, A 2008, Various criteria of collocation cohesion in internet: Comparison of resolving power. in Various criteria of collocation cohesion in internet: Comparison of resolving power. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4919 LNCS, pp. 64-72, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1/01/14. https://doi.org/10.1007/978-3-540-78135-6_6

Various criteria of collocation cohesion in internet: Comparison of resolving power. / Bolshakov, Igor A.; Bolshakova, Elena I.; Kotlyarov, Alexey P.; Gelbukh, Alexander.

Various criteria of collocation cohesion in internet: Comparison of resolving power. 2008. p. 64-72 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4919 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Various criteria of collocation cohesion in internet: Comparison of resolving power

AU - Bolshakov, Igor A.

AU - Bolshakova, Elena I.

AU - Kotlyarov, Alexey P.

AU - Gelbukh, Alexander

PY - 2008/8/27

Y1 - 2008/8/27

N2 - For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The 'noun-its own adjective' pairs are true collocations and their measure values form one distribution. The 'noun-alien adjective' pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found. © 2008 Springer-Verlag Berlin Heidelberg.

AB - For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The 'noun-its own adjective' pairs are true collocations and their measure values form one distribution. The 'noun-alien adjective' pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found. © 2008 Springer-Verlag Berlin Heidelberg.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=49949100097&origin=inward

UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=49949100097&origin=inward

U2 - 10.1007/978-3-540-78135-6_6

DO - 10.1007/978-3-540-78135-6_6

M3 - Conference contribution

SN - 354078134X

SN - 9783540781349

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 64

EP - 72

BT - Various criteria of collocation cohesion in internet: Comparison of resolving power

ER -

Bolshakov IA, Bolshakova EI, Kotlyarov AP, Gelbukh A. Various criteria of collocation cohesion in internet: Comparison of resolving power. In Various criteria of collocation cohesion in internet: Comparison of resolving power. 2008. p. 64-72. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-78135-6_6