TY - JOUR
T1 - Unsupervised learning of P NP P word combinations
AU - Galicia-Haro, Sofía N.
AU - Gelbukh, Alexander
PY - 2005
Y1 - 2005
N2 - We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.
AB - We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.
UR - http://www.scopus.com/inward/record.url?scp=24344444516&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-30586-6_37
DO - 10.1007/978-3-540-30586-6_37
M3 - Artículo de la conferencia
AN - SCOPUS:24344444516
SN - 0302-9743
VL - 3406
SP - 337
EP - 340
JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 6th International Conference, CICLing 2005
Y2 - 13 February 2005 through 19 February 2005
ER -