Formal grammar for Hispanic named entities analysis

Grettel Barceló, Eduardo Cendejas, Grigori Sidorov, Igor A. Bolshakov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific-and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the composition of such elements by means of a connector; (3) and the duality of given name/surname. In order to analyze the magnitude of the problem, two gazetteers were made, one with 93998 given names and the other with 13779 surnames. The gazetteers entries were used as terminal symbols of the proposed grammar to determine the valid interpretations in the nominal sequences; this is done by means of an automatic labeling of all the elements the nominal sequences are made of.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
Pages183-194
Number of pages12
DOIs
StatePublished - 2009
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: 1 Mar 20097 Mar 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
Country/TerritoryMexico
CityMexico City
Period1/03/097/03/09

Fingerprint

Dive into the research topics of 'Formal grammar for Hispanic named entities analysis'. Together they form a unique fingerprint.

Cite this