TY - JOUR
T1 - Topic-Aware Sentiment Analysis of News Articles
AU - Akhmetov, Iskander
AU - Gelbukh, Alexander
AU - Mussabayev, Rustam
N1 - Publisher Copyright:
© 2022 Instituto Politecnico Nacional. All rights reserved.
PY - 2022
Y1 - 2022
N2 - We consider the problem of sentiment analysis in news media articles cast as a three-way classification task: negative, positive, or neutral. We show that subdividing the training corpus by topic (local news, sports, hi-tech, and others) and training separate sentiment classifiers for each sub-corpus improves classification F1 scores. We use topics since some words carry different sentiments in different domains: e.g., the word "force" is typically positive in the sports domain but negative in the political domain. Our experiments on the Kaggle dataset with sentiment-labeled Kazakhstani news articles in Russian language using the Convolutional Neural Network (CNN) model partially proved our hypothesis, showing that for the most prominent "kz" (local news) topic, we achieve an F1 score of 0.70, which is greater than the baseline model trained without the topic-awareness showing just 0.67. Topic-aware improves F1 scores in some topics, but due to the topic/class imbalance further research is needed. However, the performance in terms of F1 over all the corpus does not improve or the improvements are very small. Moreover, our approach shows better results on topics with many text samples than those with relatively small amounts of articles.
AB - We consider the problem of sentiment analysis in news media articles cast as a three-way classification task: negative, positive, or neutral. We show that subdividing the training corpus by topic (local news, sports, hi-tech, and others) and training separate sentiment classifiers for each sub-corpus improves classification F1 scores. We use topics since some words carry different sentiments in different domains: e.g., the word "force" is typically positive in the sports domain but negative in the political domain. Our experiments on the Kaggle dataset with sentiment-labeled Kazakhstani news articles in Russian language using the Convolutional Neural Network (CNN) model partially proved our hypothesis, showing that for the most prominent "kz" (local news) topic, we achieve an F1 score of 0.70, which is greater than the baseline model trained without the topic-awareness showing just 0.67. Topic-aware improves F1 scores in some topics, but due to the topic/class imbalance further research is needed. However, the performance in terms of F1 over all the corpus does not improve or the improvements are very small. Moreover, our approach shows better results on topics with many text samples than those with relatively small amounts of articles.
KW - Mass media
KW - natural language processing
KW - news articles
KW - sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85130828753&partnerID=8YFLogxK
U2 - 10.13053/CyS-26-1-4179
DO - 10.13053/CyS-26-1-4179
M3 - Artículo
AN - SCOPUS:85130828753
SN - 1405-5546
VL - 26
SP - 423
EP - 439
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 1
ER -