HSSD: Hate speech spreader detection using N-Grams and voting classifier

Fazlourrahman Balouchzahi, Hosahalli Lakshmaiah Shashirekha, Grigori Sidorov

Research output: Contribution to journalConference articlepeer-review

10 Scopus citations

Abstract

Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In this paper, we, team MUCIC, present the proposed Voting Classifier (VC) submitted to Hate Speech Spreader Detection shared task organized by PAN 2021. The task includes profiling HS spreaders for two languages, namely, English and Spanish from the text collected from Twitter. This task can be modeled as a binary text classification problem to classify an author (Twitter user) based on his/her tweets as 'Hate speech spreader' or 'Not'. The proposed models utilizes a combination of traditional char and word n-grams with syntactic ngrams as features extracted from the training set. These features are fed to a VC that employs three Machine Learning (ML) classifiers namely, Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) with hard and soft voting. The proposed models with accuracies of 73% and 83% for English and Spanish languages respectively, obtained second rank in the shared task.

Original languageEnglish
Pages (from-to)1829-1836
Number of pages8
JournalCEUR Workshop Proceedings
Volume2936
StatePublished - 2021
Event2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania
Duration: 21 Sep 202124 Sep 2021

Keywords

  • Hate speech spreader
  • Machine learning
  • N-grams
  • Voting classifier

Fingerprint

Dive into the research topics of 'HSSD: Hate speech spreader detection using N-Grams and voting classifier'. Together they form a unique fingerprint.

Cite this