TY - GEN
T1 - Formal grammar for Hispanic named entities analysis
AU - Barceló, Grettel
AU - Cendejas, Eduardo
AU - Sidorov, Grigori
AU - Bolshakov, Igor A.
PY - 2009
Y1 - 2009
N2 - A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific-and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the composition of such elements by means of a connector; (3) and the duality of given name/surname. In order to analyze the magnitude of the problem, two gazetteers were made, one with 93998 given names and the other with 13779 surnames. The gazetteers entries were used as terminal symbols of the proposed grammar to determine the valid interpretations in the nominal sequences; this is done by means of an automatic labeling of all the elements the nominal sequences are made of.
AB - A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific-and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the composition of such elements by means of a connector; (3) and the duality of given name/surname. In order to analyze the magnitude of the problem, two gazetteers were made, one with 93998 given names and the other with 13779 surnames. The gazetteers entries were used as terminal symbols of the proposed grammar to determine the valid interpretations in the nominal sequences; this is done by means of an automatic labeling of all the elements the nominal sequences are made of.
UR - http://www.scopus.com/inward/record.url?scp=67650517329&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-00382-0_15
DO - 10.1007/978-3-642-00382-0_15
M3 - Contribución a la conferencia
SN - 3642003818
SN - 9783642003813
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 183
EP - 194
BT - Computational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
T2 - 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
Y2 - 1 March 2009 through 7 March 2009
ER -