Arabic Misogyny Identification

Fazlourrahman Balouchzahi; Grigori Sidorov; Hosahalli Lakshmaiah Shashirekha

Arabic Misogyny Identification

Fazlourrahman Balouchzahi, Grigori Sidorov, Hosahalli Lakshmaiah Shashirekha

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

Abstract

Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive content targeting women termed as Misogyny is increasing day-by-day and a person/group who shares such content is called a Misogynist. Misogyny detection can be seen as a sub-category of HS and Offensive Language Identification (OLI) tasks in which women and issues regarding them such as their rights are targeted. Despite the several works undertaken for HS and OLI tasks by several researchers, Misogyny detection has been studied rarely even for rich resource languages. To promote Misogyny detection in Arabic language, Arabic Misogyny Identification (ArMI)a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 provides the dataset and invites the researches to develop models for Misogyny detection in the given text. The shared task consists of two subtasks which can be modeled as binary and multiclass Text Classification (TC) tasks. This paper describes the models submitted by our team MUCIC to the ArMI shared task. The proposed methodology uses a combination of top frequent char and word n-grams as features to train Machine Learning (ML) classifiers and obtained an accuracy of 0.873 and F1-score of 0.497 for Subtask A and B respectively.

Original language	English
Pages (from-to)	839-846
Number of pages	8
Journal	CEUR Workshop Proceedings
Volume	3159
State	Published - 2021
Event	Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duration: 13 Dec 2021 → 17 Dec 2021

Keywords

Hate Speech
Machine Learning
Misogyny Detection
Offensive Language
Social Media

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Cite this

@article{3bbb305e365d4f2cbd7327471310aacc,

title = "Arabic Misogyny Identification",

abstract = "Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive content targeting women termed as Misogyny is increasing day-by-day and a person/group who shares such content is called a Misogynist. Misogyny detection can be seen as a sub-category of HS and Offensive Language Identification (OLI) tasks in which women and issues regarding them such as their rights are targeted. Despite the several works undertaken for HS and OLI tasks by several researchers, Misogyny detection has been studied rarely even for rich resource languages. To promote Misogyny detection in Arabic language, Arabic Misogyny Identification (ArMI)a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 provides the dataset and invites the researches to develop models for Misogyny detection in the given text. The shared task consists of two subtasks which can be modeled as binary and multiclass Text Classification (TC) tasks. This paper describes the models submitted by our team MUCIC to the ArMI shared task. The proposed methodology uses a combination of top frequent char and word n-grams as features to train Machine Learning (ML) classifiers and obtained an accuracy of 0.873 and F1-score of 0.497 for Subtask A and B respectively.",

keywords = "Hate Speech, Machine Learning, Misogyny Detection, Offensive Language, Social Media",

author = "Fazlourrahman Balouchzahi and Grigori Sidorov and Shashirekha, {Hosahalli Lakshmaiah}",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 ; Conference date: 13-12-2021 Through 17-12-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "3159",

pages = "839--846",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Arabic Misogyny Identification

AU - Balouchzahi, Fazlourrahman

AU - Sidorov, Grigori

AU - Shashirekha, Hosahalli Lakshmaiah

PY - 2021

Y1 - 2021

N2 - Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive content targeting women termed as Misogyny is increasing day-by-day and a person/group who shares such content is called a Misogynist. Misogyny detection can be seen as a sub-category of HS and Offensive Language Identification (OLI) tasks in which women and issues regarding them such as their rights are targeted. Despite the several works undertaken for HS and OLI tasks by several researchers, Misogyny detection has been studied rarely even for rich resource languages. To promote Misogyny detection in Arabic language, Arabic Misogyny Identification (ArMI)a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 provides the dataset and invites the researches to develop models for Misogyny detection in the given text. The shared task consists of two subtasks which can be modeled as binary and multiclass Text Classification (TC) tasks. This paper describes the models submitted by our team MUCIC to the ArMI shared task. The proposed methodology uses a combination of top frequent char and word n-grams as features to train Machine Learning (ML) classifiers and obtained an accuracy of 0.873 and F1-score of 0.497 for Subtask A and B respectively.

AB - Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive content targeting women termed as Misogyny is increasing day-by-day and a person/group who shares such content is called a Misogynist. Misogyny detection can be seen as a sub-category of HS and Offensive Language Identification (OLI) tasks in which women and issues regarding them such as their rights are targeted. Despite the several works undertaken for HS and OLI tasks by several researchers, Misogyny detection has been studied rarely even for rich resource languages. To promote Misogyny detection in Arabic language, Arabic Misogyny Identification (ArMI)a shared task in Forum for Information Retrieval Evaluation (FIRE) 2021 provides the dataset and invites the researches to develop models for Misogyny detection in the given text. The shared task consists of two subtasks which can be modeled as binary and multiclass Text Classification (TC) tasks. This paper describes the models submitted by our team MUCIC to the ArMI shared task. The proposed methodology uses a combination of top frequent char and word n-grams as features to train Machine Learning (ML) classifiers and obtained an accuracy of 0.873 and F1-score of 0.497 for Subtask A and B respectively.

KW - Hate Speech

KW - Machine Learning

KW - Misogyny Detection

KW - Offensive Language

KW - Social Media

UR - http://www.scopus.com/inward/record.url?scp=85134257952&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85134257952

SN - 1613-0073

VL - 3159

SP - 839

EP - 846

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021

Y2 - 13 December 2021 through 17 December 2021

ER -

Arabic Misogyny Identification

Abstract

Keywords

UN SDGs

Other files and links

Fingerprint

Cite this