Automatic detection of negative emotions within a balanced corpus of informal short texts

Vanessa A. Camacho-Vázquez; Grigori Sidorov; Sofia N. Galicia-Haro

doi:10.1089/cyber.2018.0207

Automatic detection of negative emotions within a balanced corpus of informal short texts

Vanessa A. Camacho-Vázquez, Grigori Sidorov, Sofia N. Galicia-Haro

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.

Original language	English
Pages (from-to)	781-787
Number of pages	7
Journal	Cyberpsychology, Behavior, and Social Networking
Volume	21
Issue number	12
DOIs	https://doi.org/10.1089/cyber.2018.0207
State	Published - Dec 2018

Keywords

balanced corpus of tweets
emotion recognition
feature extraction
machine learning
sentiment analysis

Access to Document

10.1089/cyber.2018.0207

Cite this

@article{64adab8a2ab245228ee937e6273572f1,

title = "Automatic detection of negative emotions within a balanced corpus of informal short texts",

abstract = "The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.",

keywords = "balanced corpus of tweets, emotion recognition, feature extraction, machine learning, sentiment analysis",

author = "Camacho-V{\'a}zquez, {Vanessa A.} and Grigori Sidorov and Galicia-Haro, {Sofia N.}",

note = "Publisher Copyright: {\textcopyright} 2018, Mary Ann Liebert, Inc., publishers 2018.",

year = "2018",

month = dec,

doi = "10.1089/cyber.2018.0207",

language = "Ingl{\'e}s",

volume = "21",

pages = "781--787",

journal = "Cyberpsychology, Behavior, and Social Networking",

issn = "2152-2715",

number = "12",

}

TY - JOUR

T1 - Automatic detection of negative emotions within a balanced corpus of informal short texts

AU - Camacho-Vázquez, Vanessa A.

AU - Sidorov, Grigori

AU - Galicia-Haro, Sofia N.

PY - 2018/12

Y1 - 2018/12

N2 - The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.

AB - The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.

KW - balanced corpus of tweets

KW - emotion recognition

KW - feature extraction

KW - machine learning

KW - sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=85058530495&partnerID=8YFLogxK

U2 - 10.1089/cyber.2018.0207

DO - 10.1089/cyber.2018.0207

M3 - Artículo

SN - 2152-2715

VL - 21

SP - 781

EP - 787

JO - Cyberpsychology, Behavior, and Social Networking

JF - Cyberpsychology, Behavior, and Social Networking

IS - 12

ER -

Automatic detection of negative emotions within a balanced corpus of informal short texts

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this