TY - GEN
T1 - Lexical-based alignment for reconstruction of structure in parallel texts
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
AU - Chanona-Hernandez, Liliana
PY - 2007
Y1 - 2007
N2 - In this paper, we present an optimization algorithm for finding the best text alignment based on the lexical similarity and the results of its evaluation as compared with baseline methods (Gale and Church, relative position). For evaluation, we use fiction texts that represent non-trivial cases of alignment. Also, we present a new method for evaluation of the algorithms of parallel texts alignment, which consists in restoration of the structure of the text in one of the languages using the units of the lower level and the available structure of the text in the other language. For example, in case of paragraph level alignment, the sentences are used to constitute the restored paragraphs. The advantage of this method is that it does not depend on corpus data.
AB - In this paper, we present an optimization algorithm for finding the best text alignment based on the lexical similarity and the results of its evaluation as compared with baseline methods (Gale and Church, relative position). For evaluation, we use fiction texts that represent non-trivial cases of alignment. Also, we present a new method for evaluation of the algorithms of parallel texts alignment, which consists in restoration of the structure of the text in one of the languages using the units of the lower level and the available structure of the text in the other language. For example, in case of paragraph level alignment, the sentences are used to constitute the restored paragraphs. The advantage of this method is that it does not depend on corpus data.
UR - http://www.scopus.com/inward/record.url?scp=38149029336&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73351-5_37
DO - 10.1007/978-3-540-73351-5_37
M3 - Contribución a la conferencia
SN - 3540733507
SN - 9783540733508
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 401
EP - 406
BT - Natural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings
PB - Springer Verlag
T2 - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007
Y2 - 27 June 2007 through 29 June 2007
ER -