Urdu Sentiment Analysis with Deep Learning Methods

Lal Khan; Ammar Amjad; Noman Ashraf; Hsien Tsung Chang; Alexander Gelbukh

doi:10.1109/ACCESS.2021.3093078

Urdu Sentiment Analysis with Deep Learning Methods

Lal Khan, Ammar Amjad, Noman Ashraf, Hsien Tsung Chang, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

62 Citas (Scopus)

Resumen

Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

Idioma original	Inglés
Número de artículo	9466841
Páginas (desde-hasta)	97803-97812
Número de páginas	10
Publicación	IEEE Access
Volumen	9
DOI	https://doi.org/10.1109/ACCESS.2021.3093078
Estado	Publicada - 2021

Acceder al documento

10.1109/ACCESS.2021.3093078

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{e5e745ed727a4a1494b8239d88bc4d58,

title = "Urdu Sentiment Analysis with Deep Learning Methods",

abstract = "Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.",

keywords = "Urdu sentiment analysis, deep learning, machine learning, natural language processing",

author = "Lal Khan and Ammar Amjad and Noman Ashraf and Chang, {Hsien Tsung} and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3093078",

language = "Ingl{\'e}s",

volume = "9",

pages = "97803--97812",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Urdu Sentiment Analysis with Deep Learning Methods

AU - Khan, Lal

AU - Amjad, Ammar

AU - Ashraf, Noman

AU - Chang, Hsien Tsung

AU - Gelbukh, Alexander

PY - 2021

Y1 - 2021

N2 - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

AB - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

KW - Urdu sentiment analysis

KW - deep learning

KW - machine learning

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85110666018&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3093078

DO - 10.1109/ACCESS.2021.3093078

M3 - Artículo

AN - SCOPUS:85110666018

SN - 2169-3536

VL - 9

SP - 97803

EP - 97812

JO - IEEE Access

JF - IEEE Access

M1 - 9466841

ER -

Urdu Sentiment Analysis with Deep Learning Methods

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto