Survey of Fake News Datasets and Detection Methods in European and Asian Languages

Maaz Amjad; Sabur Butt; Alisa Zhila; Grigori Sidorov; Liliana Chanona-Hernandez; Alexander Gelbukh

doi:10.12700/APH.19.10.2022.10.11

Survey of Fake News Datasets and Detection Methods in European and Asian Languages

Maaz Amjad, Sabur Butt, Alisa Zhila, Grigori Sidorov, Liliana Chanona-Hernandez, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

Abstract

The presence of fake news and “alternative facts” across the web is a global phenomenon that received considerable attention in recent years. Several researchers have made substantial efforts to automatically identify fake news articles based on linguistic features and neural network-based methods. However, automatic classification via machine and deep learning techniques demands a significant amount of annotated data. While several state-of-the-art datasets for the English language are available and commonly utilized for research, fake news detection in low-resource languages gained less attention. This study surveys the publicly available datasets of fake news in low/medium-resourced Asian and European languages. We also highlight the vacuum of datasets and methods in these languages. Moreover, we summarize the proposed methods and the metrics used to evaluate the classifiers in identifying fake news. This study is helpful for analysis of the available sources in the lower resource languages to solve fake news detection challenges.

Original language	English
Pages (from-to)	185-204
Number of pages	20
Journal	Acta Polytechnica Hungarica
Volume	19
Issue number	10
DOIs	https://doi.org/10.12700/APH.19.10.2022.10.11
State	Published - 2022

Keywords

datasets
deep learning
evaluation metrics
fake news
low resource languages
machine learning

Access to Document

10.12700/APH.19.10.2022.10.11

Cite this

@article{b732dcb5d3054db3afa6cf719d6db534,

title = "Survey of Fake News Datasets and Detection Methods in European and Asian Languages",

abstract = "The presence of fake news and “alternative facts” across the web is a global phenomenon that received considerable attention in recent years. Several researchers have made substantial efforts to automatically identify fake news articles based on linguistic features and neural network-based methods. However, automatic classification via machine and deep learning techniques demands a significant amount of annotated data. While several state-of-the-art datasets for the English language are available and commonly utilized for research, fake news detection in low-resource languages gained less attention. This study surveys the publicly available datasets of fake news in low/medium-resourced Asian and European languages. We also highlight the vacuum of datasets and methods in these languages. Moreover, we summarize the proposed methods and the metrics used to evaluate the classifiers in identifying fake news. This study is helpful for analysis of the available sources in the lower resource languages to solve fake news detection challenges.",

keywords = "datasets, deep learning, evaluation metrics, fake news, low resource languages, machine learning",

author = "Maaz Amjad and Sabur Butt and Alisa Zhila and Grigori Sidorov and Liliana Chanona-Hernandez and Alexander Gelbukh",

year = "2022",

doi = "10.12700/APH.19.10.2022.10.11",

language = "Ingl{\'e}s",

volume = "19",

pages = "185--204",

journal = "Acta Polytechnica Hungarica",

issn = "1785-8860",

number = "10",

}

TY - JOUR

T1 - Survey of Fake News Datasets and Detection Methods in European and Asian Languages

AU - Amjad, Maaz

AU - Butt, Sabur

AU - Zhila, Alisa

AU - Sidorov, Grigori

AU - Chanona-Hernandez, Liliana

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - The presence of fake news and “alternative facts” across the web is a global phenomenon that received considerable attention in recent years. Several researchers have made substantial efforts to automatically identify fake news articles based on linguistic features and neural network-based methods. However, automatic classification via machine and deep learning techniques demands a significant amount of annotated data. While several state-of-the-art datasets for the English language are available and commonly utilized for research, fake news detection in low-resource languages gained less attention. This study surveys the publicly available datasets of fake news in low/medium-resourced Asian and European languages. We also highlight the vacuum of datasets and methods in these languages. Moreover, we summarize the proposed methods and the metrics used to evaluate the classifiers in identifying fake news. This study is helpful for analysis of the available sources in the lower resource languages to solve fake news detection challenges.

AB - The presence of fake news and “alternative facts” across the web is a global phenomenon that received considerable attention in recent years. Several researchers have made substantial efforts to automatically identify fake news articles based on linguistic features and neural network-based methods. However, automatic classification via machine and deep learning techniques demands a significant amount of annotated data. While several state-of-the-art datasets for the English language are available and commonly utilized for research, fake news detection in low-resource languages gained less attention. This study surveys the publicly available datasets of fake news in low/medium-resourced Asian and European languages. We also highlight the vacuum of datasets and methods in these languages. Moreover, we summarize the proposed methods and the metrics used to evaluate the classifiers in identifying fake news. This study is helpful for analysis of the available sources in the lower resource languages to solve fake news detection challenges.

KW - datasets

KW - deep learning

KW - evaluation metrics

KW - fake news

KW - low resource languages

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85159102782&partnerID=8YFLogxK

U2 - 10.12700/APH.19.10.2022.10.11

DO - 10.12700/APH.19.10.2022.10.11

M3 - Artículo

AN - SCOPUS:85159102782

SN - 1785-8860

VL - 19

SP - 185

EP - 204

JO - Acta Polytechnica Hungarica

JF - Acta Polytechnica Hungarica

IS - 10

ER -

Survey of Fake News Datasets and Detection Methods in European and Asian Languages

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this