Incorporating linguistic information to statistical word-level alignment

Eduardo Cendejas, Grettel Barceló, Alexander Gelbukh, Grigori Sidorov

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.

Idioma originalInglés
Título de la publicación alojadaProgress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings
Páginas387-394
Número de páginas8
DOI
EstadoPublicada - 2009
Evento14th Iberoamerican Conference on Pattern Recognition, CIARP 2009 - Guadalajara, Jalisco, México
Duración: 15 nov. 200918 nov. 2009

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen5856 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia14th Iberoamerican Conference on Pattern Recognition, CIARP 2009
País/TerritorioMéxico
CiudadGuadalajara, Jalisco
Período15/11/0918/11/09

Huella

Profundice en los temas de investigación de 'Incorporating linguistic information to statistical word-level alignment'. En conjunto forman una huella única.

Citar esto