Ensembled Feature Selection for Urdu Fake News Detection

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

1 Cita (Scopus)

Resumen

Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news and the speed in which the fake news spreads on social media. Hence, there is a great demand for the automatic identification of fake news quickly and efficiently. Most of the fake news detection works carried out focus on resource rich languages like English and Spanish leaving the under-resourced languages like Urdu and many Indian languages less attended or unattended. UrduFake 2021 - a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 promotes detecting fake news in Urdu - an under-resourced language. This paper presents the description of the model proposed and submitted by our team MUCIC to UrduFake 2021 which aims to classify Urdu news article into one of the two categories, namely: Fake and Real. The major focus of this work is on feature engineering part to enhance the performance of traditional Machine Learning (ML) classifiers using very simple features such as word and char n-grams. Three Feature Selection (FS) algorithms, namely: Chi-square, Mutual Information Gain (MIG), and f_classif are ensembled to select the top informative features for the classification of Urdu news articles. The proposed methodology using an ensemble of five popular ML classifiers with soft voting obtained 8th rank in the shared task with an average macro F1-score of 0.592.

Idioma originalInglés
Páginas (desde-hasta)1117-1126
Número de páginas10
PublicaciónCEUR Workshop Proceedings
Volumen3159
EstadoPublicada - 2021
EventoWorking Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India
Duración: 13 dic. 202117 dic. 2021

Huella

Profundice en los temas de investigación de 'Ensembled Feature Selection for Urdu Fake News Detection'. En conjunto forman una huella única.

Citar esto