Urdu Sentiment Analysis with Deep Learning Methods

Lal Khan; Ammar Amjad; Noman Ashraf; Hsien Tsung Chang; Alexander Gelbukh

doi:10.1109/ACCESS.2021.3093078

Urdu Sentiment Analysis with Deep Learning Methods

Lal Khan, Ammar Amjad, Noman Ashraf, Hsien Tsung Chang, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

62 Scopus citations

Abstract

Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

Original language	English
Article number	9466841
Pages (from-to)	97803-97812
Number of pages	10
Journal	IEEE Access
Volume	9
DOIs	https://doi.org/10.1109/ACCESS.2021.3093078
State	Published - 2021

Keywords

Urdu sentiment analysis
deep learning
machine learning
natural language processing

Access to Document

10.1109/ACCESS.2021.3093078

Cite this

@article{e5e745ed727a4a1494b8239d88bc4d58,

title = "Urdu Sentiment Analysis with Deep Learning Methods",

abstract = "Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.",

keywords = "Urdu sentiment analysis, deep learning, machine learning, natural language processing",

author = "Lal Khan and Ammar Amjad and Noman Ashraf and Chang, {Hsien Tsung} and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3093078",

language = "Ingl{\'e}s",

volume = "9",

pages = "97803--97812",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Urdu Sentiment Analysis with Deep Learning Methods

AU - Khan, Lal

AU - Amjad, Ammar

AU - Ashraf, Noman

AU - Chang, Hsien Tsung

AU - Gelbukh, Alexander

PY - 2021

Y1 - 2021

N2 - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

AB - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.

KW - Urdu sentiment analysis

KW - deep learning

KW - machine learning

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85110666018&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3093078

DO - 10.1109/ACCESS.2021.3093078

M3 - Artículo

AN - SCOPUS:85110666018

SN - 2169-3536

VL - 9

SP - 97803

EP - 97812

JO - IEEE Access

JF - IEEE Access

M1 - 9466841

ER -

Urdu Sentiment Analysis with Deep Learning Methods

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this