TY - GEN
T1 - Individual vs. Group Violent Threats Classification in Online Discussions
AU - Ashraf, Noman
AU - Mustafa, Rabia
AU - Sidorov, Grigori
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - Violent threat is a serious crime affecting the targeted individuals or groups. It is essential for media providers to block the users that post such threats. In this paper, we focused on detection of violent threat language in YouTube comments. We categorized the threatening comments into those targeting an individual or a group. We started from an existing dataset with violent threat language identified, but without any categorization into comments targeting individuals or groups. We adopted a binary classification approach for the prediction of individual- vs. group-targeting threats. We compared two text representations: bag of words (BOW) and pre-trained word embedding such as GloVe and fastText. We used deep-learning classifiers such as 1D-CNN, LSTM, and bidirectional LSTM (BiLSTM). GloVe embedding showed the worst results, fastText performed much better, and BiLSTM on BOW with term frequency-inverse document frequency (TF-IDF) weighting scheme gave the best results, achieving 0.94% ROC-AUC and Macro-F1 score of 0.85%.
AB - Violent threat is a serious crime affecting the targeted individuals or groups. It is essential for media providers to block the users that post such threats. In this paper, we focused on detection of violent threat language in YouTube comments. We categorized the threatening comments into those targeting an individual or a group. We started from an existing dataset with violent threat language identified, but without any categorization into comments targeting individuals or groups. We adopted a binary classification approach for the prediction of individual- vs. group-targeting threats. We compared two text representations: bag of words (BOW) and pre-trained word embedding such as GloVe and fastText. We used deep-learning classifiers such as 1D-CNN, LSTM, and bidirectional LSTM (BiLSTM). GloVe embedding showed the worst results, fastText performed much better, and BiLSTM on BOW with term frequency-inverse document frequency (TF-IDF) weighting scheme gave the best results, achieving 0.94% ROC-AUC and Macro-F1 score of 0.85%.
KW - NLP
KW - Violent threat
KW - deep learning
KW - individual and group threats
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85091693888&partnerID=8YFLogxK
U2 - 10.1145/3366424.3385778
DO - 10.1145/3366424.3385778
M3 - Contribución a la conferencia
AN - SCOPUS:85091693888
T3 - The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
SP - 629
EP - 633
BT - The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery
T2 - 29th International World Wide Web Conference, WWW 2020
Y2 - 20 April 2020 through 24 April 2020
ER -