UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu

Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt, Hamza Imam Amjad, Oxana Vitman, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.

Original languageEnglish
Title of host publicationFIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation
EditorsDebasis Ganguly, Surupendu Gangopadhyay, Mandar Mitra, Prasenjit Majumder, Prasenjit Majumder
PublisherAssociation for Computing Machinery
Pages9-11
Number of pages3
ISBN (Electronic)9781450395960
DOIs
StatePublished - 13 Dec 2021
Event13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021 - Virtual, Online, India
Duration: 13 Dec 202117 Dec 2021

Publication series

NameACM International Conference Proceeding Series

Conference

Conference13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021
Country/TerritoryIndia
CityVirtual, Online
Period13/12/2117/12/21

Keywords

  • Abusive languages detection
  • Urdu language
  • low resource languages
  • threatening languages detection

Cite this