CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages

Fazlourrahman Balouchzahi; Hosahalli Lakshmaiah Shashirekha; Grigori Sidorov

CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

4 Citas (Scopus)

Resumen

Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in’Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text’. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2^nd, 4^th, and 5^th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.

Idioma original	Inglés
Páginas (desde-hasta)	887-898
Número de páginas	12
Publicación	CEUR Workshop Proceedings
Volumen	3159
Estado	Publicada - 2021
Evento	Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duración: 13 dic. 2021 → 17 dic. 2021

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{ccf6c50d256c4faa8976bf8e355c0d5e,

title = "CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages",

abstract = "Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in{\textquoteright}Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text{\textquoteright}. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2nd, 4th, and 5th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.",

keywords = "Code-Mixing, Dravidian Languages, Machine Learning, Sentiments Analysis, n-grams",

author = "Fazlourrahman Balouchzahi and Shashirekha, {Hosahalli Lakshmaiah} and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "3159",

pages = "887--898",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages

AU - Balouchzahi, Fazlourrahman

AU - Shashirekha, Hosahalli Lakshmaiah

AU - Sidorov, Grigori

PY - 2021

Y1 - 2021

N2 - Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in’Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text’. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2nd, 4th, and 5th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.

AB - Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than one language. In view of this, the current paper describes the models submitted by our team MUCIC for the shared task in’Sentiments Analysis (SA) for Dravidian Languages in Code-Mixed Text’. The objective of this shared task is to develop and evaluate models for code-mixed datasets in three Dravidian languages, namely: Kannada, Malayalam, and Tamil mixed with English language resulting in Kannada-English (Ka-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) language pairs. N-grams of char, char sequences, and syllables features are transformed into feature vectors and are used to train three Machine Learning (ML) classifiers with majority voting. The predictions on the Test set obtained average weighted F1-scores of 0.628, 0.726, and 0.619 securing 2nd, 4th, and 5th ranks for Ka-En, Ma-En, and Ta-En language pairs respectively.

KW - Code-Mixing

KW - Dravidian Languages

KW - Machine Learning

KW - Sentiments Analysis

KW - n-grams

UR - http://www.scopus.com/inward/record.url?scp=85134248634&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85134248634

SN - 1613-0073

VL - 3159

SP - 887

EP - 898

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021

Y2 - 13 December 2021 through 17 December 2021

ER -

CoSaD-Code-Mixed Sentiments Analysis for Dravidian Languages

Resumen

Otros archivos y enlaces

Huella

Citar esto