TY - GEN
T1 - UrduThreat@ FIRE2021
T2 - 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2021
AU - Amjad, Maaz
AU - Zhila, Alisa
AU - Sidorov, Grigori
AU - Labunets, Andrey
AU - Butt, Sabur
AU - Amjad, Hamza Imam
AU - Vitman, Oxana
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/12/13
Y1 - 2021/12/13
N2 - With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.
AB - With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. This shared task address the task of abusive and threatening language detection in Urdu language that has more than 230 million speakers worldwide. We presented two datasets: (i) Abusive and Non-Abusive language, (ii) Threatening and Non-Threatening language. The abusive dataset contains 1,187 tweets categorized as Abusive and 1,213 as Non-Abusive and the threatening dataset contains 4,929 tweets categorized as Non-Threatening and 1,071 as Threatening. In this shared task, 21 teams registered for participation from six countries (India, Pakistan, China, Malaysia, United Arab Emirates, Taiwan), 10 teams submitted their runs for Subtask A - Abusive Language Detection, 9 teams submitted their runs for Subtask B - Threatening Language detection, and seven teams submitted their technical reports. We provided one baseline system for Subtask A and three baseline systems for Subtask B. The best performing system achieved an F-score value of 0.88 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer models showed the best performance.
KW - Abusive languages detection
KW - Urdu language
KW - low resource languages
KW - threatening languages detection
UR - http://www.scopus.com/inward/record.url?scp=85124343873&partnerID=8YFLogxK
U2 - 10.1145/3503162.3505241
DO - 10.1145/3503162.3505241
M3 - Contribución a la conferencia
AN - SCOPUS:85124343873
T3 - ACM International Conference Proceeding Series
SP - 9
EP - 11
BT - FIRE 2021 - Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation
A2 - Ganguly, Debasis
A2 - Gangopadhyay, Surupendu
A2 - Mitra, Mandar
A2 - Majumder, Prasenjit
A2 - Majumder, Prasenjit
PB - Association for Computing Machinery
Y2 - 13 December 2021 through 17 December 2021
ER -