Abstract
Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive content targeting women termed as Misogyny is increasing day-by-day and a person/group who shares such content is called a Misogynist. Misogyny detection can be seen as a sub-category of HS and Offensive Language Identification (OLI) tasks in which women and issues regarding them such as their rights are targeted. Despite the several works undertaken for HS and OLI tasks by several researchers, Misogyny detection has been studied rarely even for rich resource languages. To promote Misogyny detection in Arabic language, Arabic Misogyny Identification (ArMI)a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 provides the dataset and invites the researches to develop models for Misogyny detection in the given text. The shared task consists of two subtasks which can be modeled as binary and multiclass Text Classification (TC) tasks. This paper describes the models submitted by our team MUCIC to the ArMI shared task. The proposed methodology uses a combination of top frequent char and word n-grams as features to train Machine Learning (ML) classifiers and obtained an accuracy of 0.873 and F1-score of 0.497 for Subtask A and B respectively.
Original language | English |
---|---|
Pages (from-to) | 839-846 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 3159 |
State | Published - 2021 |
Event | Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duration: 13 Dec 2021 → 17 Dec 2021 |
Keywords
- Hate Speech
- Machine Learning
- Misogyny Detection
- Offensive Language
- Social Media