TY - GEN
T1 - Paragraph-level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
AU - Vera-Félix, José Ángel
PY - 2006
Y1 - 2006
N2 - Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.
AB - Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.
UR - http://www.scopus.com/inward/record.url?scp=33750279267&partnerID=8YFLogxK
U2 - 10.1007/11846406_8
DO - 10.1007/11846406_8
M3 - Contribución a la conferencia
SN - 3540390901
SN - 9783540390909
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 61
EP - 67
BT - Text, Speech and Dialogue - 9th International Conference, TSD 2006, Proceedings
PB - Springer Verlag
T2 - 9th International Conference on Text, Speech and Dialogue, TSD 2006
Y2 - 11 September 2006 through 15 September 2006
ER -