Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment

Hiram Calvo, Alexander Gelbukh, Adam Kilgarriff

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

12 Citas (Scopus)

Resumen

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

Idioma originalInglés
Páginas (desde-hasta)177-188
Número de páginas12
PublicaciónLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen3406
DOI
EstadoPublicada - 2005
Evento6th International Conference, CICLing 2005 - Mexico City, México
Duración: 13 feb. 200519 feb. 2005

Huella

Profundice en los temas de investigación de 'Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment'. En conjunto forman una huella única.

Citar esto