TY - GEN
T1 - English-Spanish large statistical dictionary of inflectional forms
AU - Sidorov, Grigori
AU - Barrón-Cedeño, Alberto
AU - Rosso, Paolo
N1 - Funding Information:
The research work of the first author has been partially supported by the National Polytechnic Institute (SIP, COFAA, SIP grant 20090772), Mexican government (CONACYT/SNI), and the program Estancias en la UPV de investigadores de prestigio PAID-02-09 num. 3143. The research work of the second author has been partially supported by the CONACyT-Mexico 192021 grant. We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project. We also thank anonymous reviewers for their important comments.
PY - 2010
Y1 - 2010
N2 - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.
AB - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.
UR - http://www.scopus.com/inward/record.url?scp=82555200592&partnerID=8YFLogxK
M3 - Contribución a la conferencia
AN - SCOPUS:82555200592
T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
SP - 277
EP - 281
BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
A2 - Tapias, Daniel
A2 - Russo, Irene
A2 - Hamon, Olivier
A2 - Piperidis, Stelios
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Maegaard, Bente
A2 - Odijk, Jan
A2 - Rosner, Mike
PB - European Language Resources Association (ELRA)
T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010
Y2 - 17 May 2010 through 23 May 2010
ER -