TY - JOUR
T1 - Automatic detection of negative emotions within a balanced corpus of informal short texts
AU - Camacho-Vázquez, Vanessa A.
AU - Sidorov, Grigori
AU - Galicia-Haro, Sofia N.
N1 - Publisher Copyright:
© 2018, Mary Ann Liebert, Inc., publishers 2018.
PY - 2018/12
Y1 - 2018/12
N2 - The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.
AB - The present study deals with the detection of negative emotions in informal short texts (tweets). Our work takes advantage of several features of social networks, particularly their availability and confidence they offer users in terms of reflecting their emotions. The corpus of tweets was manually marked with emotions. The corpus was balanced because it had 3,000 tweets for each of Ekman's negative emotions and for neutral tweets (15,000 tweets in total). The objective of the present study was to apply automatic learning in two (sad versus neutral tweets) or five (tweets with emotions distinguished) categories. Different features were evaluated by changing types of elements (words or lemmas), sizes (uni-, bi-, tri-, unibi-, unibitrigrams, among others), and values (term frequency or term frequency-inverse document frequency). Sadness was detected with an F1 = 0.962. The F1 for all neutral tweets and those with negative emotions was relatively high (0.664) because the task itself was difficult (random baseline = 0.2 for five categories). The present results were obtained from experiments conducted on the balanced textual corpus for the first time and were better than the state-of-The-Art methods.
KW - balanced corpus of tweets
KW - emotion recognition
KW - feature extraction
KW - machine learning
KW - sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85058530495&partnerID=8YFLogxK
U2 - 10.1089/cyber.2018.0207
DO - 10.1089/cyber.2018.0207
M3 - Artículo
SN - 2152-2715
VL - 21
SP - 781
EP - 787
JO - Cyberpsychology, Behavior, and Social Networking
JF - Cyberpsychology, Behavior, and Social Networking
IS - 12
ER -