TY - JOUR
T1 - Abusive language detection in youtube comments leveraging replies as conversational context
AU - Ashraf, Noman
AU - Zubiaga, Arkaitz
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© Copyright 2021. Ashraf et al.
PY - 2021
Y1 - 2021
N2 - Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.
AB - Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.
KW - Abusive language detection
KW - Context aware abusive language detection
KW - Corpus
KW - Deep learning
KW - Natural language processing
KW - YouTube
UR - http://www.scopus.com/inward/record.url?scp=85124363956&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.742
DO - 10.7717/peerj-cs.742
M3 - Artículo
C2 - 34712802
AN - SCOPUS:85124363956
SN - 2376-5992
VL - 7
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e742
ER -