Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment

Hiram Calvo; Alexander Gelbukh; Adam Kilgarriff

doi:10.1007/978-3-540-30586-6_17

Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment

Hiram Calvo, Alexander Gelbukh, Adam Kilgarriff

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

12 Citas (Scopus)

Resumen

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

Idioma original	Inglés
Páginas (desde-hasta)	177-188
Número de páginas	12
Publicación	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen	3406
DOI	https://doi.org/10.1007/978-3-540-30586-6_17
Estado	Publicada - 2005
Evento	6th International Conference, CICLing 2005 - Mexico City, México Duración: 13 feb. 2005 → 19 feb. 2005

Acceder al documento

10.1007/978-3-540-30586-6_17

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{0cf8567fb3d549abad3a637ba0722e1d,

title = "Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment",

abstract = "Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.",

author = "Hiram Calvo and Alexander Gelbukh and Adam Kilgarriff",

year = "2005",

doi = "10.1007/978-3-540-30586-6_17",

language = "Ingl{\'e}s",

volume = "3406",

pages = "177--188",

journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

issn = "0302-9743",

publisher = "Springer Verlag",

note = "6th International Conference, CICLing 2005 ; Conference date: 13-02-2005 Through 19-02-2005",

}

Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment. / Calvo, Hiram ; Gelbukh, Alexander; Kilgarriff, Adam.
En: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 3406, 2005, p. 177-188.

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

TY - JOUR

T1 - Distributional thesaurus versus WordNet

T2 - 6th International Conference, CICLing 2005

AU - Calvo, Hiram

AU - Gelbukh, Alexander

AU - Kilgarriff, Adam

PY - 2005

Y1 - 2005

N2 - Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

AB - Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

UR - http://www.scopus.com/inward/record.url?scp=24344473862&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-30586-6_17

DO - 10.1007/978-3-540-30586-6_17

M3 - Artículo de la conferencia

AN - SCOPUS:24344473862

SN - 0302-9743

VL - 3406

SP - 177

EP - 188

JO - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

JF - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Y2 - 13 February 2005 through 19 February 2005

ER -

Distributional thesaurus versus WordNet: A comparison of backoff techniques for unsupervised PP attachment

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto