TY - GEN
T1 - Automatic measuring of semantic distances between word senses in a Spanish explanatory dictionary
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
AU - Chanona-Hernandez, Liliana
PY - 2003
Y1 - 2003
N2 - The problem of what is a semantic distance and how it should be measured is interesting and not very well-investigated. Usually the distance is measured between words. We propose to measure the distances between different senses of the same word. One of the purposes of this measurement is evaluation of the plausibility of application of word sense disambiguation techniques in information retrieval. Namely, if word senses are too close (too similar), then, on the one hand, the user will be unable to distinguish them for his/her informational need, and, on the other hand, WSD methods will not be reliable. Another purpose is the ability to estimate the quality of a dictionary, i.e., if there are many close (similar) senses, then the dictionary should be revised. In our experiments, we used Anaya dictionary of Spanish language. Dictionary definitions were lemmatized. For measuring the distance, we calculated the literal matching between two senses and matching using synonyms. The synonyms were taken from the Spanish dictionary of synonyms. The results show that about 90% of senses are different (the distance is rather long), still about 10% are rather similar (the distance is short). Thus, in general, the WSD techniques seem to be useful in information retrieval, but in case of the Anaya dictionary about 10% of definitions of similar senses should be revised.
AB - The problem of what is a semantic distance and how it should be measured is interesting and not very well-investigated. Usually the distance is measured between words. We propose to measure the distances between different senses of the same word. One of the purposes of this measurement is evaluation of the plausibility of application of word sense disambiguation techniques in information retrieval. Namely, if word senses are too close (too similar), then, on the one hand, the user will be unable to distinguish them for his/her informational need, and, on the other hand, WSD methods will not be reliable. Another purpose is the ability to estimate the quality of a dictionary, i.e., if there are many close (similar) senses, then the dictionary should be revised. In our experiments, we used Anaya dictionary of Spanish language. Dictionary definitions were lemmatized. For measuring the distance, we calculated the literal matching between two senses and matching using synonyms. The synonyms were taken from the Spanish dictionary of synonyms. The results show that about 90% of senses are different (the distance is rather long), still about 10% are rather similar (the distance is short). Thus, in general, the WSD techniques seem to be useful in information retrieval, but in case of the Anaya dictionary about 10% of definitions of similar senses should be revised.
KW - Computational linguistics
KW - Distance in explanatory dictionary
KW - Synonyms
KW - Word senses
UR - http://www.scopus.com/inward/record.url?scp=1542642517&partnerID=8YFLogxK
M3 - Contribución a la conferencia
AN - SCOPUS:1542642517
SN - 0889863490
T3 - Proceedings of the IASTED International Conference on Computer Science and Technology
SP - 399
EP - 404
BT - Proceedings of the IASTED International Conference on Computer Science and Technology
A2 - Sahni, S.
T2 - Proceedings of the IASTED International Conference on Computer Science and Technology
Y2 - 19 May 2003 through 21 May 2003
ER -