Ensembled Feature Selection for Urdu Fake News Detection

Fazlourrahman Balouchzahi; Hosahalli Lakshmaiah Shashirekha; Grigori Sidorov

Ensembled Feature Selection for Urdu Fake News Detection

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

1 Cita (Scopus)

Resumen

Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news and the speed in which the fake news spreads on social media. Hence, there is a great demand for the automatic identification of fake news quickly and efficiently. Most of the fake news detection works carried out focus on resource rich languages like English and Spanish leaving the under-resourced languages like Urdu and many Indian languages less attended or unattended. UrduFake 2021 - a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 promotes detecting fake news in Urdu - an under-resourced language. This paper presents the description of the model proposed and submitted by our team MUCIC to UrduFake 2021 which aims to classify Urdu news article into one of the two categories, namely: Fake and Real. The major focus of this work is on feature engineering part to enhance the performance of traditional Machine Learning (ML) classifiers using very simple features such as word and char n-grams. Three Feature Selection (FS) algorithms, namely: Chi-square, Mutual Information Gain (MIG), and f_classif are ensembled to select the top informative features for the classification of Urdu news articles. The proposed methodology using an ensemble of five popular ML classifiers with soft voting obtained 8^th rank in the shared task with an average macro F1-score of 0.592.

Idioma original	Inglés
Páginas (desde-hasta)	1117-1126
Número de páginas	10
Publicación	CEUR Workshop Proceedings
Volumen	3159
Estado	Publicada - 2021
Evento	Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duración: 13 dic. 2021 → 17 dic. 2021

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{1c87da045fb548cfb0f537b43a3f3be1,

title = "Ensembled Feature Selection for Urdu Fake News Detection",

abstract = "Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news and the speed in which the fake news spreads on social media. Hence, there is a great demand for the automatic identification of fake news quickly and efficiently. Most of the fake news detection works carried out focus on resource rich languages like English and Spanish leaving the under-resourced languages like Urdu and many Indian languages less attended or unattended. UrduFake 2021 - a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 promotes detecting fake news in Urdu - an under-resourced language. This paper presents the description of the model proposed and submitted by our team MUCIC to UrduFake 2021 which aims to classify Urdu news article into one of the two categories, namely: Fake and Real. The major focus of this work is on feature engineering part to enhance the performance of traditional Machine Learning (ML) classifiers using very simple features such as word and char n-grams. Three Feature Selection (FS) algorithms, namely: Chi-square, Mutual Information Gain (MIG), and f_classif are ensembled to select the top informative features for the classification of Urdu news articles. The proposed methodology using an ensemble of five popular ML classifiers with soft voting obtained 8th rank in the shared task with an average macro F1-score of 0.592.",

keywords = "Feature Engineering, Feature Selection, Machine Learning, UrduFake",

author = "Fazlourrahman Balouchzahi and Shashirekha, {Hosahalli Lakshmaiah} and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "3159",

pages = "1117--1126",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Ensembled Feature Selection for Urdu Fake News Detection

AU - Balouchzahi, Fazlourrahman

AU - Shashirekha, Hosahalli Lakshmaiah

AU - Sidorov, Grigori

PY - 2021

Y1 - 2021

N2 - Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news and the speed in which the fake news spreads on social media. Hence, there is a great demand for the automatic identification of fake news quickly and efficiently. Most of the fake news detection works carried out focus on resource rich languages like English and Spanish leaving the under-resourced languages like Urdu and many Indian languages less attended or unattended. UrduFake 2021 - a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 promotes detecting fake news in Urdu - an under-resourced language. This paper presents the description of the model proposed and submitted by our team MUCIC to UrduFake 2021 which aims to classify Urdu news article into one of the two categories, namely: Fake and Real. The major focus of this work is on feature engineering part to enhance the performance of traditional Machine Learning (ML) classifiers using very simple features such as word and char n-grams. Three Feature Selection (FS) algorithms, namely: Chi-square, Mutual Information Gain (MIG), and f_classif are ensembled to select the top informative features for the classification of Urdu news articles. The proposed methodology using an ensemble of five popular ML classifiers with soft voting obtained 8th rank in the shared task with an average macro F1-score of 0.592.

AB - Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news and the speed in which the fake news spreads on social media. Hence, there is a great demand for the automatic identification of fake news quickly and efficiently. Most of the fake news detection works carried out focus on resource rich languages like English and Spanish leaving the under-resourced languages like Urdu and many Indian languages less attended or unattended. UrduFake 2021 - a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 promotes detecting fake news in Urdu - an under-resourced language. This paper presents the description of the model proposed and submitted by our team MUCIC to UrduFake 2021 which aims to classify Urdu news article into one of the two categories, namely: Fake and Real. The major focus of this work is on feature engineering part to enhance the performance of traditional Machine Learning (ML) classifiers using very simple features such as word and char n-grams. Three Feature Selection (FS) algorithms, namely: Chi-square, Mutual Information Gain (MIG), and f_classif are ensembled to select the top informative features for the classification of Urdu news articles. The proposed methodology using an ensemble of five popular ML classifiers with soft voting obtained 8th rank in the shared task with an average macro F1-score of 0.592.

KW - Feature Engineering

KW - Feature Selection

KW - Machine Learning

KW - UrduFake

UR - http://www.scopus.com/inward/record.url?scp=85134204639&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85134204639

SN - 1613-0073

VL - 3159

SP - 1117

EP - 1126

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021

Y2 - 13 December 2021 through 17 December 2021

ER -

Ensembled Feature Selection for Urdu Fake News Detection

Resumen

Otros archivos y enlaces

Huella

Citar esto