MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling

Fazlourrahman Balouchzahi; Hosahalli Lakshmaiah Shashirekha; Grigori Sidorov

MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

6 Scopus citations

Abstract

Since the beginning of Covid-19 era in November 2019, the patient growth curve is closely accompanied by the growth of fake news. Therefore, developing tools and models for the detection of fake news from real ones in various domains have become more significant than the earlier days. To address the detection of fake news, in this paper, we, team MUCIC, describe the models submitted to 'Fake News Detection', a shared task organized by CLEF-2021-CheckThat! Lab. This shared task contains two subtasks namely; Fake News Detection of News Articles (Subtask 3A) and Topical Domain Classification of News Articles (Subtask 3B) and both are multi-class text classification tasks. The proposed models have been developed by fine-tuning the three transformer-based language models namely; Roberta, Distilbert, and BERT from HuggingFace using training data and then ensembling them as estimators with majority voting. The proposed models performances evaluated through the evaluation script provided by organizers obtained F1-scores of 0.5309 and 0.8550 for Subtask 3A and Subtask 3B respectively.

Original language	English
Pages (from-to)	455-464
Number of pages	10
Journal	CEUR Workshop Proceedings
Volume	2936
State	Published - 2021
Event	2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania Duration: 21 Sep 2021 → 24 Sep 2021

Keywords

BERT
Distilbert
Domain identification
Fake news detection
Roberta
Transformers

Cite this

@article{83a5efa26f8c4ca48f2382c3d121341a,

title = "MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling",

abstract = "Since the beginning of Covid-19 era in November 2019, the patient growth curve is closely accompanied by the growth of fake news. Therefore, developing tools and models for the detection of fake news from real ones in various domains have become more significant than the earlier days. To address the detection of fake news, in this paper, we, team MUCIC, describe the models submitted to 'Fake News Detection', a shared task organized by CLEF-2021-CheckThat! Lab. This shared task contains two subtasks namely; Fake News Detection of News Articles (Subtask 3A) and Topical Domain Classification of News Articles (Subtask 3B) and both are multi-class text classification tasks. The proposed models have been developed by fine-tuning the three transformer-based language models namely; Roberta, Distilbert, and BERT from HuggingFace using training data and then ensembling them as estimators with majority voting. The proposed models performances evaluated through the evaluation script provided by organizers obtained F1-scores of 0.5309 and 0.8550 for Subtask 3A and Subtask 3B respectively.",

keywords = "BERT, Distilbert, Domain identification, Fake news detection, Roberta, Transformers",

author = "Fazlourrahman Balouchzahi and Shashirekha, {Hosahalli Lakshmaiah} and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 ; Conference date: 21-09-2021 Through 24-09-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "2936",

pages = "455--464",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - MUCIC at CheckThat! 2021

T2 - 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021

AU - Balouchzahi, Fazlourrahman

AU - Shashirekha, Hosahalli Lakshmaiah

AU - Sidorov, Grigori

PY - 2021

Y1 - 2021

N2 - Since the beginning of Covid-19 era in November 2019, the patient growth curve is closely accompanied by the growth of fake news. Therefore, developing tools and models for the detection of fake news from real ones in various domains have become more significant than the earlier days. To address the detection of fake news, in this paper, we, team MUCIC, describe the models submitted to 'Fake News Detection', a shared task organized by CLEF-2021-CheckThat! Lab. This shared task contains two subtasks namely; Fake News Detection of News Articles (Subtask 3A) and Topical Domain Classification of News Articles (Subtask 3B) and both are multi-class text classification tasks. The proposed models have been developed by fine-tuning the three transformer-based language models namely; Roberta, Distilbert, and BERT from HuggingFace using training data and then ensembling them as estimators with majority voting. The proposed models performances evaluated through the evaluation script provided by organizers obtained F1-scores of 0.5309 and 0.8550 for Subtask 3A and Subtask 3B respectively.

AB - Since the beginning of Covid-19 era in November 2019, the patient growth curve is closely accompanied by the growth of fake news. Therefore, developing tools and models for the detection of fake news from real ones in various domains have become more significant than the earlier days. To address the detection of fake news, in this paper, we, team MUCIC, describe the models submitted to 'Fake News Detection', a shared task organized by CLEF-2021-CheckThat! Lab. This shared task contains two subtasks namely; Fake News Detection of News Articles (Subtask 3A) and Topical Domain Classification of News Articles (Subtask 3B) and both are multi-class text classification tasks. The proposed models have been developed by fine-tuning the three transformer-based language models namely; Roberta, Distilbert, and BERT from HuggingFace using training data and then ensembling them as estimators with majority voting. The proposed models performances evaluated through the evaluation script provided by organizers obtained F1-scores of 0.5309 and 0.8550 for Subtask 3A and Subtask 3B respectively.

KW - BERT

KW - Distilbert

KW - Domain identification

KW - Fake news detection

KW - Roberta

KW - Transformers

UR - http://www.scopus.com/inward/record.url?scp=85113505995&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85113505995

SN - 1613-0073

VL - 2936

SP - 455

EP - 464

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 21 September 2021 through 24 September 2021

ER -

MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling

Abstract

Keywords

Other files and links

Fingerprint

Cite this