TY - GEN
T1 - Incorporating linguistic information to statistical word-level alignment
AU - Cendejas, Eduardo
AU - Barceló, Grettel
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
N1 - Funding Information:
Work done under partial support of Mexican Government (SNI, CONACYT grant 83270 and 50206-H, and SIP-IPN grant 20090772 and 20091587).
PY - 2009
Y1 - 2009
N2 - Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.
AB - Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.
KW - Cognates
KW - Dictionary
KW - Linguistic information
KW - Morphological information
KW - Parallel texts
KW - Semantic domains
KW - Word alignment
UR - http://www.scopus.com/inward/record.url?scp=78651262624&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-10268-4_46
DO - 10.1007/978-3-642-10268-4_46
M3 - Contribución a la conferencia
SN - 3642102670
SN - 9783642102677
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 387
EP - 394
BT - Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings
T2 - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009
Y2 - 15 November 2009 through 18 November 2009
ER -