Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

Maaz Amjad; Sabur Butt; Hamza Imam Amjad; Alisa Zhila; Grigori Sidorov; Alexander Gelbukh

Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Alisa Zhila, Grigori Sidorov, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

7 Citas (Scopus)

Resumen

Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2^nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu language. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification problem to label a given news article as a real or a fake news article. The organizers provide a dataset comprising news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business, split into training and testing sets. The training set contains 1300 annotated news articles —750 real news, 550 fake news, while the testing set contains 300 news articles —200 real, 100 fake news. 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered for participation in the UrduFake@FIRE2021 shared task. Out of those, 18 teams submitted their experimental results and 11 of those submitted their technical reports, which is substantially higher compared to the UrduFake shared task in 2020 when only 6 teams submitted their technical reports. The technical reports submitted by the participants demonstrated different data representation techniques ranging from count-based BoW features to word vector embeddings as well as the use of numerous machine learning algorithms ranging from traditional SVM to various neural network architectures including Transformers such as BERT and RoBERTa. In this year’s competition, the best performing system obtained an F1-macro score of 0.679, which is lower than the past year’s best result of 0.907 F1-macro. Admittedly, while training sets from the past and the current years overlap to a large extent, the testing set provided this year is completely different.

Idioma original	Inglés
Páginas (desde-hasta)	1101-1116
Número de páginas	16
Publicación	CEUR Workshop Proceedings
Volumen	3159
Estado	Publicada - 2021
Evento	Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duración: 13 dic. 2021 → 17 dic. 2021

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{4649fdd27799429082429ad60f4583d1,

title = "Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021",

abstract = "Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu language. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification problem to label a given news article as a real or a fake news article. The organizers provide a dataset comprising news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business, split into training and testing sets. The training set contains 1300 annotated news articles —750 real news, 550 fake news, while the testing set contains 300 news articles —200 real, 100 fake news. 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered for participation in the UrduFake@FIRE2021 shared task. Out of those, 18 teams submitted their experimental results and 11 of those submitted their technical reports, which is substantially higher compared to the UrduFake shared task in 2020 when only 6 teams submitted their technical reports. The technical reports submitted by the participants demonstrated different data representation techniques ranging from count-based BoW features to word vector embeddings as well as the use of numerous machine learning algorithms ranging from traditional SVM to various neural network architectures including Transformers such as BERT and RoBERTa. In this year{\textquoteright}s competition, the best performing system obtained an F1-macro score of 0.679, which is lower than the past year{\textquoteright}s best result of 0.907 F1-macro. Admittedly, while training sets from the past and the current years overlap to a large extent, the testing set provided this year is completely different.",

keywords = "NLP, Natural Language Processing, Urdu language, fake news detection, low resource language, medium resource language, shared task, text classification",

author = "Maaz Amjad and Sabur Butt and Amjad, {Hamza Imam} and Alisa Zhila and Grigori Sidorov and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "3159",

pages = "1101--1116",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

AU - Amjad, Maaz

AU - Butt, Sabur

AU - Amjad, Hamza Imam

AU - Zhila, Alisa

AU - Sidorov, Grigori

AU - Gelbukh, Alexander

PY - 2021

Y1 - 2021

N2 - Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu language. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification problem to label a given news article as a real or a fake news article. The organizers provide a dataset comprising news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business, split into training and testing sets. The training set contains 1300 annotated news articles —750 real news, 550 fake news, while the testing set contains 300 news articles —200 real, 100 fake news. 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered for participation in the UrduFake@FIRE2021 shared task. Out of those, 18 teams submitted their experimental results and 11 of those submitted their technical reports, which is substantially higher compared to the UrduFake shared task in 2020 when only 6 teams submitted their technical reports. The technical reports submitted by the participants demonstrated different data representation techniques ranging from count-based BoW features to word vector embeddings as well as the use of numerous machine learning algorithms ranging from traditional SVM to various neural network architectures including Transformers such as BERT and RoBERTa. In this year’s competition, the best performing system obtained an F1-macro score of 0.679, which is lower than the past year’s best result of 0.907 F1-macro. Admittedly, while training sets from the past and the current years overlap to a large extent, the testing set provided this year is completely different.

AB - Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu language. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification problem to label a given news article as a real or a fake news article. The organizers provide a dataset comprising news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business, split into training and testing sets. The training set contains 1300 annotated news articles —750 real news, 550 fake news, while the testing set contains 300 news articles —200 real, 100 fake news. 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered for participation in the UrduFake@FIRE2021 shared task. Out of those, 18 teams submitted their experimental results and 11 of those submitted their technical reports, which is substantially higher compared to the UrduFake shared task in 2020 when only 6 teams submitted their technical reports. The technical reports submitted by the participants demonstrated different data representation techniques ranging from count-based BoW features to word vector embeddings as well as the use of numerous machine learning algorithms ranging from traditional SVM to various neural network architectures including Transformers such as BERT and RoBERTa. In this year’s competition, the best performing system obtained an F1-macro score of 0.679, which is lower than the past year’s best result of 0.907 F1-macro. Admittedly, while training sets from the past and the current years overlap to a large extent, the testing set provided this year is completely different.

KW - NLP

KW - Natural Language Processing

KW - Urdu language

KW - fake news detection

KW - low resource language

KW - medium resource language

KW - shared task

KW - text classification

UR - http://www.scopus.com/inward/record.url?scp=85124373581&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85124373581

SN - 1613-0073

VL - 3159

SP - 1101

EP - 1116

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021

Y2 - 13 December 2021 through 17 December 2021

ER -

Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

Resumen

Otros archivos y enlaces

Huella

Citar esto