Lexical-based alignment for reconstruction of structure in parallel texts

Alexander Gelbukh, Grigori Sidorov, Liliana Chanona-Hernandez

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

In this paper, we present an optimization algorithm for finding the best text alignment based on the lexical similarity and the results of its evaluation as compared with baseline methods (Gale and Church, relative position). For evaluation, we use fiction texts that represent non-trivial cases of alignment. Also, we present a new method for evaluation of the algorithms of parallel texts alignment, which consists in restoration of the structure of the text in one of the languages using the units of the lower level and the available structure of the text in the other language. For example, in case of paragraph level alignment, the sentences are used to constitute the restored paragraphs. The advantage of this method is that it does not depend on corpus data.

Idioma originalInglés
Título de la publicación alojadaNatural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings
EditorialSpringer Verlag
Páginas401-406
Número de páginas6
ISBN (versión impresa)3540733507, 9783540733508
DOI
EstadoPublicada - 2007
Evento12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007 - Paris, Francia
Duración: 27 jun. 200729 jun. 2007

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen4592 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007
País/TerritorioFrancia
CiudadParis
Período27/06/0729/06/07

Huella

Profundice en los temas de investigación de 'Lexical-based alignment for reconstruction of structure in parallel texts'. En conjunto forman una huella única.

Citar esto