English-Spanish large statistical dictionary of inflectional forms

Grigori Sidorov; Alberto Barrón-Cedeño; Paolo Rosso

English-Spanish large statistical dictionary of inflectional forms

Grigori Sidorov, Alberto Barrón-Cedeño, Paolo Rosso

Centro de Investigación en Computación (CIC)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

Original language	English
Title of host publication	Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
Editors	Daniel Tapias, Irene Russo, Olivier Hamon, Stelios Piperidis, Nicoletta Calzolari, Khalid Choukri, Joseph Mariani, Helene Mazo, Bente Maegaard, Jan Odijk, Mike Rosner
Publisher	European Language Resources Association (ELRA)
Pages	277-281
Number of pages	5
ISBN (Electronic)	2951740867, 9782951740860
State	Published - 2010
Event	7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta Duration: 17 May 2010 → 23 May 2010

Publication series

Name	Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

Conference

Conference	7th International Conference on Language Resources and Evaluation, LREC 2010
Country/Territory	Malta
City	Valletta
Period	17/05/10 → 23/05/10

Cite this

Sidorov, G., Barrón-Cedeño, A., & Rosso, P. (2010). English-Spanish large statistical dictionary of inflectional forms. In D. Tapias, I. Russo, O. Hamon, S. Piperidis, N. Calzolari, K. Choukri, J. Mariani, H. Mazo, B. Maegaard, J. Odijk, & M. Rosner (Eds.), Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 277-281). (Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010). European Language Resources Association (ELRA).

Sidorov, Grigori ; Barrón-Cedeño, Alberto ; Rosso, Paolo. / English-Spanish large statistical dictionary of inflectional forms. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. editor / Daniel Tapias ; Irene Russo ; Olivier Hamon ; Stelios Piperidis ; Nicoletta Calzolari ; Khalid Choukri ; Joseph Mariani ; Helene Mazo ; Bente Maegaard ; Jan Odijk ; Mike Rosner. European Language Resources Association (ELRA), 2010. pp. 277-281 (Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010).

@inproceedings{b1a41ce8f9ca44c18e21934af86d0c24,

title = "English-Spanish large statistical dictionary of inflectional forms",

abstract = "The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.",

author = "Grigori Sidorov and Alberto Barr{\'o}n-Cede{\~n}o and Paolo Rosso",

note = "Funding Information: The research work of the first author has been partially supported by the National Polytechnic Institute (SIP, COFAA, SIP grant 20090772), Mexican government (CONACYT/SNI), and the program Estancias en la UPV de investigadores de prestigio PAID-02-09 num. 3143. The research work of the second author has been partially supported by the CONACyT-Mexico 192021 grant. We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project. We also thank anonymous reviewers for their important comments.; 7th International Conference on Language Resources and Evaluation, LREC 2010 ; Conference date: 17-05-2010 Through 23-05-2010",

year = "2010",

language = "Ingl{\'e}s",

series = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",

publisher = "European Language Resources Association (ELRA)",

pages = "277--281",

editor = "Daniel Tapias and Irene Russo and Olivier Hamon and Stelios Piperidis and Nicoletta Calzolari and Khalid Choukri and Joseph Mariani and Helene Mazo and Bente Maegaard and Jan Odijk and Mike Rosner",

booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",

}

Sidorov, G, Barrón-Cedeño, A & Rosso, P 2010, English-Spanish large statistical dictionary of inflectional forms. in D Tapias, I Russo, O Hamon, S Piperidis, N Calzolari, K Choukri, J Mariani, H Mazo, B Maegaard, J Odijk & M Rosner (eds), Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, European Language Resources Association (ELRA), pp. 277-281, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/05/10.

English-Spanish large statistical dictionary of inflectional forms. / Sidorov, Grigori; Barrón-Cedeño, Alberto; Rosso, Paolo.
Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ed. / Daniel Tapias; Irene Russo; Olivier Hamon; Stelios Piperidis; Nicoletta Calzolari; Khalid Choukri; Joseph Mariani; Helene Mazo; Bente Maegaard; Jan Odijk; Mike Rosner. European Language Resources Association (ELRA), 2010. p. 277-281 (Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - English-Spanish large statistical dictionary of inflectional forms

AU - Sidorov, Grigori

AU - Barrón-Cedeño, Alberto

AU - Rosso, Paolo

N1 - Funding Information: The research work of the first author has been partially supported by the National Polytechnic Institute (SIP, COFAA, SIP grant 20090772), Mexican government (CONACYT/SNI), and the program Estancias en la UPV de investigadores de prestigio PAID-02-09 num. 3143. The research work of the second author has been partially supported by the CONACyT-Mexico 192021 grant. We thank the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 research project. We also thank anonymous reviewers for their important comments.

PY - 2010

Y1 - 2010

N2 - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

AB - The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

UR - http://www.scopus.com/inward/record.url?scp=82555200592&partnerID=8YFLogxK

M3 - Contribución a la conferencia

AN - SCOPUS:82555200592

T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

SP - 277

EP - 281

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

A2 - Tapias, Daniel

A2 - Russo, Irene

A2 - Hamon, Olivier

A2 - Piperidis, Stelios

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Mariani, Joseph

A2 - Mazo, Helene

A2 - Maegaard, Bente

A2 - Odijk, Jan

A2 - Rosner, Mike

PB - European Language Resources Association (ELRA)

T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010

Y2 - 17 May 2010 through 23 May 2010

ER -

Sidorov G, Barrón-Cedeño A, Rosso P. English-Spanish large statistical dictionary of inflectional forms. In Tapias D, Russo I, Hamon O, Piperidis S, Calzolari N, Choukri K, Mariani J, Mazo H, Maegaard B, Odijk J, Rosner M, editors, Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 277-281. (Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010).

English-Spanish large statistical dictionary of inflectional forms

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this