Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

Maaz Amjad; Alisa Zhila; Grigori Sidorov; Andrey Labunets; Sabur Butt; Hamza Imam Amjad; Oxana Vitman; Alexander Gelbukh

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt, Hamza Imam Amjad, Oxana Vitman, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

5 Scopus citations

Abstract

With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language that has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labeled as: (i) Abusive and Non-Abusive, (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A —Abusive Language Detection, 9 teams submitted their runs for Subtask B —Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.

Original language	English
Pages (from-to)	744-762
Number of pages	19
Journal	CEUR Workshop Proceedings
Volume	3159
State	Published - 2021
Event	Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duration: 13 Dec 2021 → 17 Dec 2021

Keywords

Natural language processing
Twitter tweets
Urdu language
abusive language detection
shared task
text classification
threatening language detection

Cite this

@article{95d3eb37f2ac4b28827d79cf891b1595,

title = "Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021",

abstract = "With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language that has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labeled as: (i) Abusive and Non-Abusive, (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A —Abusive Language Detection, 9 teams submitted their runs for Subtask B —Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.",

keywords = "Natural language processing, Twitter tweets, Urdu language, abusive language detection, shared task, text classification, threatening language detection",

author = "Maaz Amjad and Alisa Zhila and Grigori Sidorov and Andrey Labunets and Sabur Butt and Amjad, {Hamza Imam} and Oxana Vitman and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "3159",

pages = "744--762",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

AU - Amjad, Maaz

AU - Zhila, Alisa

AU - Sidorov, Grigori

AU - Labunets, Andrey

AU - Butt, Sabur

AU - Amjad, Hamza Imam

AU - Vitman, Oxana

AU - Gelbukh, Alexander

PY - 2021

Y1 - 2021

N2 - With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language that has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labeled as: (i) Abusive and Non-Abusive, (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A —Abusive Language Detection, 9 teams submitted their runs for Subtask B —Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.

AB - With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language that has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labeled as: (i) Abusive and Non-Abusive, (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A —Abusive Language Detection, 9 teams submitted their runs for Subtask B —Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.

KW - Natural language processing

KW - Twitter tweets

KW - Urdu language

KW - abusive language detection

KW - shared task

KW - text classification

KW - threatening language detection

UR - http://www.scopus.com/inward/record.url?scp=85124335870&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85124335870

SN - 1613-0073

VL - 3159

SP - 744

EP - 762

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021

Y2 - 13 December 2021 through 17 December 2021

ER -

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021

Abstract

Keywords

Other files and links

Fingerprint

Cite this