TY - JOUR
T1 - CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages
AU - Balouchzahi, Fazlourrahman
AU - Shashirekha, Hosahalli Lakshmaiah
AU - Sidorov, Grigori
N1 - Publisher Copyright:
© 2021 Copyright for this paper by its authors.
PY - 2021
Y1 - 2021
N2 - Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in’Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text’. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2nd, 4th, and 5th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.
AB - Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in’Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text’. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2nd, 4th, and 5th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.
KW - Code-Mixing
KW - Dravidian Languages
KW - Machine Learning
KW - Sentiments Analysis
KW - n-grams
UR - http://www.scopus.com/inward/record.url?scp=85134248634&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85134248634
SN - 1613-0073
VL - 3159
SP - 887
EP - 898
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021
Y2 - 13 December 2021 through 17 December 2021
ER -