TY - GEN
T1 - On Detection of Malapropisms by Multistage Collocation Testing
AU - Bolshakov, Igor A.
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2003 Gesellschaft fur Informatik (GI). All rights reserved.
PY - 2003
Y1 - 2003
N2 - Malapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We present an algorithm of malapropism detection and correction based on evaluating the cohesion. As a measure of semantic compatibility of words we consider their ability to form syntactically linked and semantically admissible word combinations (collocations), e.g: travel (around the) world. With this, text cohesion at a content word is measured as the number of collocations it forms with the words in its immediate context. We detect malapropisms as words forming no collocations in the context. To test whether two words can form a collocation, we consider two types of resources: a collocation DB and an Internet search engine, e.g., Google. We illustrate the proposed method by classifying, tracing, and evaluating several English malapropisms.
AB - Malapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We present an algorithm of malapropism detection and correction based on evaluating the cohesion. As a measure of semantic compatibility of words we consider their ability to form syntactically linked and semantically admissible word combinations (collocations), e.g: travel (around the) world. With this, text cohesion at a content word is measured as the number of collocations it forms with the words in its immediate context. We detect malapropisms as words forming no collocations in the context. To test whether two words can form a collocation, we consider two types of resources: a collocation DB and an Internet search engine, e.g., Google. We illustrate the proposed method by classifying, tracing, and evaluating several English malapropisms.
UR - http://www.scopus.com/inward/record.url?scp=33646024784&partnerID=8YFLogxK
M3 - Contribución a la conferencia
AN - SCOPUS:33646024784
T3 - Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
SP - 28
EP - 41
BT - Natural Language Processing and Information Systems, 8th International Conference on Applications of Natural Language to Information Systems, NLDB 2003
A2 - Dusterhoft, Antje
A2 - Thalheim, Bernhard
PB - Gesellschaft fur Informatik (GI)
T2 - 8th International Conference on Applications of Natural Language to Information Systems, NLDB 2003
Y2 - 23 June 2003 through 25 June 2003
ER -