UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu

Maaz Amjad; Alisa Zhila; Grigori Sidorov; Andrey Labunets; Sabur Butt; Hamza Imam Amjad; Oxana Vitman; Alexander Gelbukh

doi:10.1145/3503162.3505241

UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu

Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt, Hamza Imam Amjad, Oxana Vitman, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

12 Citas (Scopus)

Resumen

With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.

Idioma original	Inglés
Título de la publicación alojada	FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation
Editores	Debasis Ganguly, Surupendu Gangopadhyay, Mandar Mitra, Prasenjit Majumder, Prasenjit Majumder
Editorial	Association for Computing Machinery
Páginas	9-11
Número de páginas	3
ISBN (versión digital)	9781450395960
DOI	https://doi.org/10.1145/3503162.3505241
Estado	Publicada - 13 dic. 2021
Evento	13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021 - Virtual, Online, India Duración: 13 dic. 2021 → 17 dic. 2021

Serie de la publicación

Nombre	ACM International Conference Proceeding Series

Conferencia

Conferencia	13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021
País/Territorio	India
Ciudad	Virtual, Online
Período	13/12/21 → 17/12/21

Acceder al documento

10.1145/3503162.3505241

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

Amjad, M., Zhila, A., Sidorov, G., Labunets, A., Butt, S., Amjad, H. I., Vitman, O., & Gelbukh, A. (2021). UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu. En D. Ganguly, S. Gangopadhyay, M. Mitra, P. Majumder, & P. Majumder (Eds.), FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 9-11). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3503162.3505241

Amjad, Maaz ; Zhila, Alisa ; Sidorov, Grigori et al. / UrduThreat@ FIRE2021 : Shared Track on Abusive Threat Identification in Urdu. FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation. editor / Debasis Ganguly ; Surupendu Gangopadhyay ; Mandar Mitra ; Prasenjit Majumder ; Prasenjit Majumder. Association for Computing Machinery, 2021. pp. 9-11 (ACM International Conference Proceeding Series).

@inproceedings{ea123d4f9b6a45fbb9154db55bda90c5,

title = "UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu",

abstract = "With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.",

keywords = "Abusive languages detection, Urdu language, low resource languages, threatening languages detection",

author = "Maaz Amjad and Alisa Zhila and Grigori Sidorov and Andrey Labunets and Sabur Butt and Amjad, {Hamza Imam} and Oxana Vitman and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2021 Owner/Author.; 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

month = dec,

day = "13",

doi = "10.1145/3503162.3505241",

language = "Ingl{\'e}s",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "9--11",

editor = "Debasis Ganguly and Surupendu Gangopadhyay and Mandar Mitra and Prasenjit Majumder and Prasenjit Majumder",

booktitle = "FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation",

}

Amjad, M, Zhila, A, Sidorov, G, Labunets, A, Butt, S, Amjad, HI, Vitman, O & Gelbukh, A 2021, UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu. En D Ganguly, S Gangopadhyay, M Mitra, P Majumder & P Majumder (eds.), FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 9-11, 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021, Virtual, Online, India, 13/12/21. https://doi.org/10.1145/3503162.3505241

UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu. / Amjad, Maaz; Zhila, Alisa; Sidorov, Grigori et al.
FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation. ed. / Debasis Ganguly; Surupendu Gangopadhyay; Mandar Mitra; Prasenjit Majumder; Prasenjit Majumder. Association for Computing Machinery, 2021. p. 9-11 (ACM International Conference Proceeding Series).

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

TY - GEN

T1 - UrduThreat@ FIRE2021

T2 - 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021

AU - Amjad, Maaz

AU - Zhila, Alisa

AU - Sidorov, Grigori

AU - Labunets, Andrey

AU - Butt, Sabur

AU - Amjad, Hamza Imam

AU - Vitman, Oxana

AU - Gelbukh, Alexander

PY - 2021/12/13

Y1 - 2021/12/13

N2 - With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.

AB - With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.

KW - Abusive languages detection

KW - Urdu language

KW - low resource languages

KW - threatening languages detection

UR - http://www.scopus.com/inward/record.url?scp=85124343873&partnerID=8YFLogxK

U2 - 10.1145/3503162.3505241

DO - 10.1145/3503162.3505241

M3 - Contribución a la conferencia

AN - SCOPUS:85124343873

T3 - ACM International Conference Proceeding Series

SP - 9

EP - 11

BT - FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation

A2 - Ganguly, Debasis

A2 - Gangopadhyay, Surupendu

A2 - Mitra, Mandar

A2 - Majumder, Prasenjit

PB - Association for Computing Machinery

Y2 - 13 December 2021 through 17 December 2021

ER -

Amjad M, Zhila A, Sidorov G, Labunets A, Butt S, Amjad HI et al. UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu. En Ganguly D, Gangopadhyay S, Mitra M, Majumder P, Majumder P, editores, FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation. Association for Computing Machinery. 2021. p. 9-11. (ACM International Conference Proceeding Series). doi: 10.1145/3503162.3505241