Unsupervised learning of P NP P word combinations

Sofía N. Galicia-Haro, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

Resumen

We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.

Idioma originalInglés
Páginas (desde-hasta)337-340
Número de páginas4
PublicaciónLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen3406
DOI
EstadoPublicada - 2005
Evento6th International Conference, CICLing 2005 - Mexico City, México
Duración: 13 feb. 200519 feb. 2005

Huella

Profundice en los temas de investigación de 'Unsupervised learning of P NP P word combinations'. En conjunto forman una huella única.

Citar esto