HSSD: Hate speech spreader detection using N-Grams and voting classifier

Fazlourrahman Balouchzahi; Hosahalli Lakshmaiah Shashirekha; Grigori Sidorov

HSSD: Hate speech spreader detection using N-Grams and voting classifier

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

10 Citas (Scopus)

Resumen

Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In this paper, we, team MUCIC, present the proposed Voting Classifier (VC) submitted to Hate Speech Spreader Detection shared task organized by PAN 2021. The task includes profiling HS spreaders for two languages, namely, English and Spanish from the text collected from Twitter. This task can be modeled as a binary text classification problem to classify an author (Twitter user) based on his/her tweets as 'Hate speech spreader' or 'Not'. The proposed models utilizes a combination of traditional char and word n-grams with syntactic ngrams as features extracted from the training set. These features are fed to a VC that employs three Machine Learning (ML) classifiers namely, Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) with hard and soft voting. The proposed models with accuracies of 73% and 83% for English and Spanish languages respectively, obtained second rank in the shared task.

Idioma original	Inglés
Páginas (desde-hasta)	1829-1836
Número de páginas	8
Publicación	CEUR Workshop Proceedings
Volumen	2936
Estado	Publicada - 2021
Evento	2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Rumanía Duración: 21 sep. 2021 → 24 sep. 2021

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{a33f2b5a411c48b796d2d96e78b8359a,

title = "HSSD: Hate speech spreader detection using N-Grams and voting classifier",

abstract = "Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In this paper, we, team MUCIC, present the proposed Voting Classifier (VC) submitted to Hate Speech Spreader Detection shared task organized by PAN 2021. The task includes profiling HS spreaders for two languages, namely, English and Spanish from the text collected from Twitter. This task can be modeled as a binary text classification problem to classify an author (Twitter user) based on his/her tweets as 'Hate speech spreader' or 'Not'. The proposed models utilizes a combination of traditional char and word n-grams with syntactic ngrams as features extracted from the training set. These features are fed to a VC that employs three Machine Learning (ML) classifiers namely, Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) with hard and soft voting. The proposed models with accuracies of 73% and 83% for English and Spanish languages respectively, obtained second rank in the shared task.",

keywords = "Hate speech spreader, Machine learning, N-grams, Voting classifier",

author = "Fazlourrahman Balouchzahi and Shashirekha, {Hosahalli Lakshmaiah} and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 ; Conference date: 21-09-2021 Through 24-09-2021",

year = "2021",

language = "Ingl{\'e}s",

volume = "2936",

pages = "1829--1836",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - HSSD

T2 - 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021

AU - Balouchzahi, Fazlourrahman

AU - Shashirekha, Hosahalli Lakshmaiah

AU - Sidorov, Grigori

PY - 2021

Y1 - 2021

N2 - Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In this paper, we, team MUCIC, present the proposed Voting Classifier (VC) submitted to Hate Speech Spreader Detection shared task organized by PAN 2021. The task includes profiling HS spreaders for two languages, namely, English and Spanish from the text collected from Twitter. This task can be modeled as a binary text classification problem to classify an author (Twitter user) based on his/her tweets as 'Hate speech spreader' or 'Not'. The proposed models utilizes a combination of traditional char and word n-grams with syntactic ngrams as features extracted from the training set. These features are fed to a VC that employs three Machine Learning (ML) classifiers namely, Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) with hard and soft voting. The proposed models with accuracies of 73% and 83% for English and Spanish languages respectively, obtained second rank in the shared task.

AB - Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In this paper, we, team MUCIC, present the proposed Voting Classifier (VC) submitted to Hate Speech Spreader Detection shared task organized by PAN 2021. The task includes profiling HS spreaders for two languages, namely, English and Spanish from the text collected from Twitter. This task can be modeled as a binary text classification problem to classify an author (Twitter user) based on his/her tweets as 'Hate speech spreader' or 'Not'. The proposed models utilizes a combination of traditional char and word n-grams with syntactic ngrams as features extracted from the training set. These features are fed to a VC that employs three Machine Learning (ML) classifiers namely, Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) with hard and soft voting. The proposed models with accuracies of 73% and 83% for English and Spanish languages respectively, obtained second rank in the shared task.

KW - Hate speech spreader

KW - Machine learning

KW - N-grams

KW - Voting classifier

UR - http://www.scopus.com/inward/record.url?scp=85113504096&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85113504096

SN - 1613-0073

VL - 2936

SP - 1829

EP - 1836

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 21 September 2021 through 24 September 2021

ER -

HSSD: Hate speech spreader detection using N-Grams and voting classifier

Resumen

Otros archivos y enlaces

Huella

Citar esto