On correction of semantic errors in natural language texts with a dictionary of literal paronyms

Alexander Gelbukh, Igor A. Bolshakov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Due to the open nature of the Web, search engines must include means of meaningful processing of incorrect texts, including automatic error detection and correction. One of wide-spread types of errors in Internet texts are malapropisms, i.e., semantic errors replacing a word by another existing word similar in letter composition and/or sound but semantically incompatible with the context. Methods for detection and correction of malapropisms have been proposed recently. Any such method relies on a generator of correction candi-dates-paronyms, i.e., real words similar to the suspicious one encountered in the text and having the same grammatical properties. Literal paronyms are words at the distant of few editing operations from a given word. We argue that a dictionary of literal paronyms should be compiled beforehand and that its units should be grammeme names. For Spanish, such grammemes are (1) singu-lars and plurals of nouns; (2) adjectives plus participles; (3) verbs in infinitive; (4) gerunds plus adverbs; (5) personal verb forms. Basing on existing Spanish electronic dictionaries, we have compiled a dictionary of one-letter-distant lit-eral paronyms. The size of the dictionary is few tens thousand entries, an entry averaging approximately three paronyms. We calculate the gain in number of candidate search operations achievable through the proposed dictionary and give illustrative examples of correcting one-letter malapropisms using our dictionary.

Original languageEnglish
Title of host publicationAdvances in Web Intelligence - 2nd International Atlantic Web Intelligence Conference, AWIC 2004, Proceedings
EditorsJesus Favela, Ernestina Menasalvas, Edgar Chavez
PublisherSpringer Verlag
Pages105-114
Number of pages10
ISBN (Print)9783540246817
DOIs
StatePublished - 2004
Event2nd International Atlantic Web Intelligence Conference, AWIC 2004 - Cancun, Mexico
Duration: 16 May 200419 May 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3034
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Atlantic Web Intelligence Conference, AWIC 2004
Country/TerritoryMexico
CityCancun
Period16/05/0419/05/04

Fingerprint

Dive into the research topics of 'On correction of semantic errors in natural language texts with a dictionary of literal paronyms'. Together they form a unique fingerprint.

Cite this