TY - JOUR
T1 - Semantic Similarity Estimation Using Vector Symbolic Architectures
AU - Quiroz-Mercado, Job Isaias
AU - Barron-Fernandez, Ricardo
AU - Ramirez-Salinas, Marco Antonio
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - For many natural language processing applications, estimating similarity and relatedness between words are key tasks that serve as the basis for classification and generalization. Currently, vector semantic models (VSM) have become a fundamental language modeling tool. VSMs represent words as points in a high-dimensional space and follow the distributional hypothesis of meaning, which assumes that semantic similarity is related to the context. In this paper, we propose a model whose representations are based on the semantic features associated with a concept within the ConceptNet knowledge graph. The proposed model is based on a vector symbolic architecture framework, which defines a set of arithmetic operations to encode the semantic features within a single high-dimensional vector. In addition to word distribution, these vector representations consider several types of information. Moreover, owing to the properties of high-dimensional spaces, they have the additional advantage of being interpretable. We analyze the model's performance on the SimLex-999 dataset, a dataset where commonly used distributional models (e.g., word2vec or GloVe) perform poorly. Our results are similar to those of other hybrid models, and they surpass several state-of-the-art distributional and knowledge-based models.
AB - For many natural language processing applications, estimating similarity and relatedness between words are key tasks that serve as the basis for classification and generalization. Currently, vector semantic models (VSM) have become a fundamental language modeling tool. VSMs represent words as points in a high-dimensional space and follow the distributional hypothesis of meaning, which assumes that semantic similarity is related to the context. In this paper, we propose a model whose representations are based on the semantic features associated with a concept within the ConceptNet knowledge graph. The proposed model is based on a vector symbolic architecture framework, which defines a set of arithmetic operations to encode the semantic features within a single high-dimensional vector. In addition to word distribution, these vector representations consider several types of information. Moreover, owing to the properties of high-dimensional spaces, they have the additional advantage of being interpretable. We analyze the model's performance on the SimLex-999 dataset, a dataset where commonly used distributional models (e.g., word2vec or GloVe) perform poorly. Our results are similar to those of other hybrid models, and they surpass several state-of-the-art distributional and knowledge-based models.
KW - Concept representation
KW - semantic similarity
KW - vector symbolic architectures
KW - word embeddings
UR - http://www.scopus.com/inward/record.url?scp=85087327864&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3001765
DO - 10.1109/ACCESS.2020.3001765
M3 - Artículo
AN - SCOPUS:85087327864
SN - 2169-3536
VL - 8
SP - 109120
EP - 109132
JO - IEEE Access
JF - IEEE Access
M1 - 9115025
ER -