A study of lexical function detection with word2vec and supervised machine learning

Olga Kolesnikova; Alexander Gelbukh

doi:10.3233/JIFS-179866

A study of lexical function detection with word2vec and supervised machine learning

Olga Kolesnikova, Alexander Gelbukh

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods.

Original language	English
Pages (from-to)	1993-2001
Number of pages	9
Journal	Journal of Intelligent and Fuzzy Systems
Volume	39
Issue number	2
DOIs	https://doi.org/10.3233/JIFS-179866
State	Published - 2020

Keywords

Meaning-Text Theory
Word embeddings
lexical function
supervised machine learning
word2vec

Access to Document

10.3233/JIFS-179866

Cite this

@article{c840bc7c39ae4707bcc19d7bab0ef7fa,

title = "A study of lexical function detection with word2vec and supervised machine learning",

abstract = "In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods.",

keywords = "Meaning-Text Theory, Word embeddings, lexical function, supervised machine learning, word2vec",

author = "Olga Kolesnikova and Alexander Gelbukh",

year = "2020",

doi = "10.3233/JIFS-179866",

language = "Ingl{\'e}s",

volume = "39",

pages = "1993--2001",

journal = "Journal of Intelligent and Fuzzy Systems",

issn = "1064-1246",

number = "2",

}

TY - JOUR

T1 - A study of lexical function detection with word2vec and supervised machine learning

AU - Kolesnikova, Olga

AU - Gelbukh, Alexander

PY - 2020

Y1 - 2020

N2 - In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods.

AB - In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods.

KW - Meaning-Text Theory

KW - Word embeddings

KW - lexical function

KW - supervised machine learning

KW - word2vec

UR - http://www.scopus.com/inward/record.url?scp=85091080873&partnerID=8YFLogxK

U2 - 10.3233/JIFS-179866

DO - 10.3233/JIFS-179866

M3 - Artículo

AN - SCOPUS:85091080873

SN - 1064-1246

VL - 39

SP - 1993

EP - 2001

JO - Journal of Intelligent and Fuzzy Systems

JF - Journal of Intelligent and Fuzzy Systems

IS - 2

ER -

A study of lexical function detection with word2vec and supervised machine learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this