A study of lexical function detection with word2vec and supervised machine learning

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods.

Original languageEnglish
Pages (from-to)1993-2001
Number of pages9
JournalJournal of Intelligent and Fuzzy Systems
Volume39
Issue number2
DOIs
StatePublished - 2020

Keywords

  • Meaning-Text Theory
  • Word embeddings
  • lexical function
  • supervised machine learning
  • word2vec

Fingerprint

Dive into the research topics of 'A study of lexical function detection with word2vec and supervised machine learning'. Together they form a unique fingerprint.

Cite this