Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment

Hiram Calvo, Alexander Gelbukh, Adam Kilgarriff

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations

Abstract

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

Original languageEnglish
Pages (from-to)177-188
Number of pages12
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3406
DOIs
StatePublished - 2005
Event6th International Conference, CICLing 2005 - Mexico City, Mexico
Duration: 13 Feb 200519 Feb 2005

Fingerprint

Dive into the research topics of 'Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment'. Together they form a unique fingerprint.

Cite this