Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding

Olumide Ebenezer Ojo; Thang Hoang Ta; Alexander Gelbukh; Hiram Calvo; Grigori Sidorov; Olaronke Oluwayemisi Adebanji

doi:10.13053/CyS-26-2-4107

Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding

Olumide Ebenezer Ojo, Thang Hoang Ta, Alexander Gelbukh, Hiram Calvo, Grigori Sidorov, Olaronke Oluwayemisi Adebanji

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes Algorithm (NBA), Logistic Regression Model (LRM), Support Vector Machines (SVM), Random Forest Classifier (RFC) and the one-dimensional Convolutional Neural Networks (1D-CNN) are the models proposed. With a weighted macro-F1 score of 0.66 and a 0.90 accuracy, the performance of the 1D-CNN and GloVe embeddings was best among all the models.

Original language	English
Pages (from-to)	1007-1013
Number of pages	7
Journal	Computacion y Sistemas
Volume	26
Issue number	2
DOIs	https://doi.org/10.13053/CyS-26-2-4107
State	Published - 2022

Keywords

1D-CNN
Hate speech
gloVe

Access to Document

10.13053/CyS-26-2-4107

Cite this

@article{dbca781b7c234fc696bda01fd82126cc,

title = "Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding",

abstract = "Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes Algorithm (NBA), Logistic Regression Model (LRM), Support Vector Machines (SVM), Random Forest Classifier (RFC) and the one-dimensional Convolutional Neural Networks (1D-CNN) are the models proposed. With a weighted macro-F1 score of 0.66 and a 0.90 accuracy, the performance of the 1D-CNN and GloVe embeddings was best among all the models.",

keywords = "1D-CNN, Hate speech, gloVe",

author = "Ojo, {Olumide Ebenezer} and Ta, {Thang Hoang} and Alexander Gelbukh and Hiram Calvo and Grigori Sidorov and Adebanji, {Olaronke Oluwayemisi}",

year = "2022",

doi = "10.13053/CyS-26-2-4107",

language = "Ingl{\'e}s",

volume = "26",

pages = "1007--1013",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "2",

}

TY - JOUR

T1 - Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding

AU - Ojo, Olumide Ebenezer

AU - Ta, Thang Hoang

AU - Gelbukh, Alexander

AU - Calvo, Hiram

AU - Sidorov, Grigori

AU - Adebanji, Olaronke Oluwayemisi

PY - 2022

Y1 - 2022

N2 - Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes Algorithm (NBA), Logistic Regression Model (LRM), Support Vector Machines (SVM), Random Forest Classifier (RFC) and the one-dimensional Convolutional Neural Networks (1D-CNN) are the models proposed. With a weighted macro-F1 score of 0.66 and a 0.90 accuracy, the performance of the 1D-CNN and GloVe embeddings was best among all the models.

AB - Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes Algorithm (NBA), Logistic Regression Model (LRM), Support Vector Machines (SVM), Random Forest Classifier (RFC) and the one-dimensional Convolutional Neural Networks (1D-CNN) are the models proposed. With a weighted macro-F1 score of 0.66 and a 0.90 accuracy, the performance of the 1D-CNN and GloVe embeddings was best among all the models.

KW - 1D-CNN

KW - Hate speech

KW - gloVe

UR - http://www.scopus.com/inward/record.url?scp=85129210092&partnerID=8YFLogxK

U2 - 10.13053/CyS-26-2-4107

DO - 10.13053/CyS-26-2-4107

M3 - Artículo

AN - SCOPUS:85129210092

SN - 1405-5546

VL - 26

SP - 1007

EP - 1013

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 2

ER -

Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this