Paragraph-level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries

Alexander Gelbukh, Grigori Sidorov, José Ángel Vera-Félix

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

3 Citas (Scopus)

Resumen

Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.

Idioma originalInglés
Título de la publicación alojadaText, Speech and Dialogue - 9th International Conference, TSD 2006, Proceedings
EditorialSpringer Verlag
Páginas61-67
Número de páginas7
ISBN (versión impresa)3540390901, 9783540390909
DOI
EstadoPublicada - 2006
Evento9th International Conference on Text, Speech and Dialogue, TSD 2006 - Brno, República Checa
Duración: 11 sep. 200615 sep. 2006

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen4188 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia9th International Conference on Text, Speech and Dialogue, TSD 2006
País/TerritorioRepública Checa
CiudadBrno
Período11/09/0615/09/06

Huella

Profundice en los temas de investigación de 'Paragraph-level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries'. En conjunto forman una huella única.

Citar esto