Detection and correction of malapropisms in Spanish by means of Internet search

Igor A. Bolshakov, Sofia N. Galicia-Haro, Alexander Gelbukh

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

8 Citas (Scopus)

Resumen

Malapropisms are real-word errors that lead to syntactically correct but semantically implausible text. We report an experiment on detection and correction of Spanish malapropisms. Malapropos words semantically destroy collocations (syntactically connected word pairs) they are in. Thus we detect possible malapropisms as words that do not form semantically plausible collocations with neighboring words. As correction candidates, we select words similar to the suspected one but forming plausible collocations with neighboring words. To judge semantic plausibility of a collocation, we use Google statistics of occurrences of the word combination and of the two words taken apart. Since collocation components can be separated by other words in a sentence, Google statistics is gathered for the most probable distance between them. The statistics is recalculated to a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Our experiments gave promising results.

Idioma originalInglés
Título de la publicación alojadaText, Speech and Dialogue - 8th International Conference, TSD 2005, Proceedings
EditorialSpringer Verlag
Páginas115-122
Número de páginas8
ISBN (versión impresa)3540287892, 9783540287896
DOI
EstadoPublicada - 2005
Evento8th International Conference on Text, Speech and Dialogue, TSD 2005 - Karlovy Vary, República Checa
Duración: 12 sep. 200515 sep. 2005

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen3658 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia8th International Conference on Text, Speech and Dialogue, TSD 2005
País/TerritorioRepública Checa
CiudadKarlovy Vary
Período12/09/0515/09/05

Huella

Profundice en los temas de investigación de 'Detection and correction of malapropisms in Spanish by means of Internet search'. En conjunto forman una huella única.

Citar esto