Incorporating linguistic information to statistical word-level alignment

Eduardo Cendejas; Grettel Barceló; Alexander Gelbukh; Grigori Sidorov

doi:10.1007/978-3-642-10268-4_46

Incorporating linguistic information to statistical word-level alignment

Eduardo Cendejas, Grettel Barceló, Alexander Gelbukh, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

Resumen

Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.

Idioma original	Inglés
Título de la publicación alojada	Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings
Páginas	387-394
Número de páginas	8
DOI	https://doi.org/10.1007/978-3-642-10268-4_46
Estado	Publicada - 2009
Evento	14th Iberoamerican Conference on Pattern Recognition, CIARP 2009 - Guadalajara, Jalisco, México Duración: 15 nov. 2009 → 18 nov. 2009

Serie de la publicación

Nombre	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen	5856 LNCS
ISSN (versión impresa)	0302-9743
ISSN (versión digital)	1611-3349

Conferencia

Conferencia	14th Iberoamerican Conference on Pattern Recognition, CIARP 2009
País/Territorio	México
Ciudad	Guadalajara, Jalisco
Período	15/11/09 → 18/11/09

Acceder al documento

10.1007/978-3-642-10268-4_46

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

Cendejas, E., Barceló, G., Gelbukh, A., & Sidorov, G. (2009). Incorporating linguistic information to statistical word-level alignment. En Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings (pp. 387-394). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5856 LNCS). https://doi.org/10.1007/978-3-642-10268-4_46

Cendejas, Eduardo ; Barceló, Grettel ; Gelbukh, Alexander et al. / Incorporating linguistic information to statistical word-level alignment. Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings. 2009. pp. 387-394 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{72a71ee73e584e8193555d5cab554a55,

title = "Incorporating linguistic information to statistical word-level alignment",

abstract = "Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.",

keywords = "Cognates, Dictionary, Linguistic information, Morphological information, Parallel texts, Semantic domains, Word alignment",

author = "Eduardo Cendejas and Grettel Barcel{\'o} and Alexander Gelbukh and Grigori Sidorov",

note = "Funding Information: Work done under partial support of Mexican Government (SNI, CONACYT grant 83270 and 50206-H, and SIP-IPN grant 20090772 and 20091587).; 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009 ; Conference date: 15-11-2009 Through 18-11-2009",

year = "2009",

doi = "10.1007/978-3-642-10268-4_46",

language = "Ingl{\'e}s",

isbn = "3642102670",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "387--394",

booktitle = "Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings",

}

Cendejas, E, Barceló, G, Gelbukh, A & Sidorov, G 2009, Incorporating linguistic information to statistical word-level alignment. En Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5856 LNCS, pp. 387-394, 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Guadalajara, Jalisco, México, 15/11/09. https://doi.org/10.1007/978-3-642-10268-4_46

Incorporating linguistic information to statistical word-level alignment. / Cendejas, Eduardo; Barceló, Grettel; Gelbukh, Alexander et al.
Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings. 2009. p. 387-394 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5856 LNCS).

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

TY - GEN

T1 - Incorporating linguistic information to statistical word-level alignment

AU - Cendejas, Eduardo

AU - Barceló, Grettel

AU - Gelbukh, Alexander

AU - Sidorov, Grigori

N1 - Funding Information: Work done under partial support of Mexican Government (SNI, CONACYT grant 83270 and 50206-H, and SIP-IPN grant 20090772 and 20091587).

PY - 2009

Y1 - 2009

N2 - Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.

AB - Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, the enrichment can be performed on paragraphs, sentences or words, of the expressed content in the source language and its translation. There are two main approaches to perform word-level alignment: statistical or linguistic. Due to the dissimilar grammar rules the languages have, the statistical algorithms usually give lower precision. That is why the development of this type of algorithms is generally aimed at a specific language pair using linguistic techniques. A hybrid alignment system based on the combination of the two traditional approaches is presented in this paper. It provides user-friendly configuration and is adaptable to the computational environment. The system uses linguistic resources and procedures such as identification of cognates, morphological information, syntactic trees, dictionaries, and semantic domains. We show that the system outperforms existing algorithms.

KW - Cognates

KW - Dictionary

KW - Linguistic information

KW - Morphological information

KW - Parallel texts

KW - Semantic domains

KW - Word alignment

UR - http://www.scopus.com/inward/record.url?scp=78651262624&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-10268-4_46

DO - 10.1007/978-3-642-10268-4_46

M3 - Contribución a la conferencia

SN - 3642102670

SN - 9783642102677

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 387

EP - 394

BT - Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings

T2 - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009

Y2 - 15 November 2009 through 18 November 2009

ER -

Cendejas E, Barceló G, Gelbukh A , Sidorov G. Incorporating linguistic information to statistical word-level alignment. En Progress in Pattern Recognition, Image Analysis, Computer Vision and Applications - 14th Iberoamerican Conference on Pattern Recognition, CIARP 2009, Proceedings. 2009. p. 387-394. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-10268-4_46

Incorporating linguistic information to statistical word-level alignment

Resumen

Serie de la publicación

Conferencia

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto