CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation

Noman Ashraf; Sabur Butt; Grigori Sidorov; Alexander Gelbukh

CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation

Noman Ashraf, Sabur Butt, Grigori Sidorov, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

Disinformation in the form of fake news, phoney press releases and hoaxes may be misleading, especially when they are not from their original sources and this fake news can cause significant harm to the people. In this paper, we report several machine learning classifiers on the CLEF2021 dataset for the tasks of news claim and topic classification using n-grams. We achieve an F₁ score of 38.92% on news claim classification (task 3a) and an F₁ score of 78.96% on topic classification (task 3b). In addition, we augmented the dataset for news claim classification and we observed that insertion of alternative words was not beneficial for the fake news classification task.

Original language	English
Pages (from-to)	446-454
Number of pages	9
Journal	CEUR Workshop Proceedings
Volume	2936
State	Published - 2021
Event	2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania Duration: 21 Sep 2021 → 24 Sep 2021

Keywords

Fake news claim classification
Fake news data augmentation
Fake news detection
Fake news topic classification

Cite this

@article{1e711c74aed14b3590119b5d2703e2bb,

title = "CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation",

abstract = "Disinformation in the form of fake news, phoney press releases and hoaxes may be misleading, especially when they are not from their original sources and this fake news can cause significant harm to the people. In this paper, we report several machine learning classifiers on the CLEF2021 dataset for the tasks of news claim and topic classification using n-grams. We achieve an F1 score of 38.92% on news claim classification (task 3a) and an F1 score of 78.96% on topic classification (task 3b). In addition, we augmented the dataset for news claim classification and we observed that insertion of alternative words was not beneficial for the fake news classification task.",

keywords = "Fake news claim classification, Fake news data augmentation, Fake news detection, Fake news topic classification",

author = "Noman Ashraf and Sabur Butt and Grigori Sidorov and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 ; Conference date: 21-09-2021 Through 24-09-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "2936",

pages = "446--454",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - CIC at CheckThat! 2021

T2 - 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021

AU - Ashraf, Noman

AU - Butt, Sabur

AU - Sidorov, Grigori

AU - Gelbukh, Alexander

PY - 2021

Y1 - 2021

N2 - Disinformation in the form of fake news, phoney press releases and hoaxes may be misleading, especially when they are not from their original sources and this fake news can cause significant harm to the people. In this paper, we report several machine learning classifiers on the CLEF2021 dataset for the tasks of news claim and topic classification using n-grams. We achieve an F1 score of 38.92% on news claim classification (task 3a) and an F1 score of 78.96% on topic classification (task 3b). In addition, we augmented the dataset for news claim classification and we observed that insertion of alternative words was not beneficial for the fake news classification task.

AB - Disinformation in the form of fake news, phoney press releases and hoaxes may be misleading, especially when they are not from their original sources and this fake news can cause significant harm to the people. In this paper, we report several machine learning classifiers on the CLEF2021 dataset for the tasks of news claim and topic classification using n-grams. We achieve an F1 score of 38.92% on news claim classification (task 3a) and an F1 score of 78.96% on topic classification (task 3b). In addition, we augmented the dataset for news claim classification and we observed that insertion of alternative words was not beneficial for the fake news classification task.

KW - Fake news claim classification

KW - Fake news data augmentation

KW - Fake news detection

KW - Fake news topic classification

UR - http://www.scopus.com/inward/record.url?scp=85113448631&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85113448631

SN - 1613-0073

VL - 2936

SP - 446

EP - 454

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 21 September 2021 through 24 September 2021

ER -

CIC at CheckThat! 2021: Fake news detection using machine learning and data augmentation

Abstract

Keywords

Other files and links

Fingerprint

Cite this