Evaluation of TnT Tagger for Spanish

R. M. Carrasco, A. Gelbukh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Part of speech (POS) tagger is a necessary module in many natural language text processing tasks. A POS tagger is a program that accepts an unprepared raw text in input and to each word adds a tag specifying its grammatical properties, such as part of speech, number, person, etc. One of popular POS taggers - TnT tagger - has been extensively tested for English and some other languages. This paper reports on its evaluation for Spanish language. Error analysis is reported, explaining how some specific features of Spanish language affect tagger performance. It is reported that on Spanish texts TnT shows overall tagging accuracy between 92.5% and 95.84%, specifically, between 95.47% and 98.56% on known words and between 75.57% and 83.49% on unknown words. Results show that TnT has reached a good level of maturity and is helpful enough for NLP tasks.

Original languageEnglish
Title of host publicationProceedings of the 4th Mexican International Conference on Computer Science, ENC 2003
EditorsEdgar Chavez, Jesus Favela, Alberto Oliart, Marcelo Mejia
PublisherIEEE Computer Society
Pages18-25
Number of pages8
ISBN (Electronic)0769519156
DOIs
StatePublished - 2003
Event4th Mexican International Conference on Computer Science, ENC 2003 - Tlaxcala, Mexico
Duration: 8 Sep 200312 Sep 2003

Publication series

NameProceedings of the Mexican International Conference on Computer Science
Volume2003-January
ISSN (Print)1550-4069

Conference

Conference4th Mexican International Conference on Computer Science, ENC 2003
Country/TerritoryMexico
CityTlaxcala
Period8/09/0312/09/03

Keywords

  • Character recognition
  • Error analysis
  • Mood
  • Natural languages
  • Speech processing
  • Speech recognition
  • Tagging
  • Testing
  • Text processing
  • Text recognition

Fingerprint

Dive into the research topics of 'Evaluation of TnT Tagger for Spanish'. Together they form a unique fingerprint.

Cite this