Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh, George Duenas

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

15 Citas (Scopus)

Resumen

Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words.

Idioma originalInglés
Número de artículo8686355
Páginas (desde-hasta)41-53
Número de páginas13
PublicaciónIEEE Computational Intelligence Magazine
Volumen14
N.º2
DOI
EstadoPublicada - may. 2019

Huella

Profundice en los temas de investigación de 'Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis'. En conjunto forman una huella única.

Citar esto