Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis

Sergio Jimenez; Fabio A. Gonzalez; Alexander Gelbukh; George Duenas

doi:10.1109/MCI.2019.2901085

Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh, George Duenas

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

15 Citas (Scopus)

Resumen

Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words.

Idioma original	Inglés
Número de artículo	8686355
Páginas (desde-hasta)	41-53
Número de páginas	13
Publicación	IEEE Computational Intelligence Magazine
Volumen	14
N.º	2
DOI	https://doi.org/10.1109/MCI.2019.2901085
Estado	Publicada - may. 2019

Acceder al documento

10.1109/MCI.2019.2901085

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{2074c65b0f4d4e6993d6797a1039a691,

title = "Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis",

abstract = "Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words.",

author = "Sergio Jimenez and Gonzalez, {Fabio A.} and Alexander Gelbukh and George Duenas",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2019",

month = may,

doi = "10.1109/MCI.2019.2901085",

language = "Ingl{\'e}s",

volume = "14",

pages = "41--53",

journal = "IEEE Computational Intelligence Magazine",

issn = "1556-603X",

number = "2",

}

TY - JOUR

T1 - Word2set

T2 - WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis

AU - Jimenez, Sergio

AU - Gonzalez, Fabio A.

AU - Gelbukh, Alexander

AU - Duenas, George

PY - 2019/5

Y1 - 2019/5

N2 - Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words.

AB - Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words.

UR - http://www.scopus.com/inward/record.url?scp=85064665562&partnerID=8YFLogxK

U2 - 10.1109/MCI.2019.2901085

DO - 10.1109/MCI.2019.2901085

M3 - Artículo

AN - SCOPUS:85064665562

SN - 1556-603X

VL - 14

SP - 41

EP - 53

JO - IEEE Computational Intelligence Magazine

JF - IEEE Computational Intelligence Magazine

IS - 2

M1 - 8686355

ER -

Word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto