TY - JOUR
T1 - Evaluation of intermediate pre-training for the detection of offensive language
AU - Aroyehun, Segun Taofeek
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2021 CEUR-WS. All rights reserved.
PY - 2021
Y1 - 2021
N2 - This paper presents an evaluation of intermediate pretraining for the task of offensive language identification. We leverage recent advances in multilingual contextual representation and fine-tuning of pre-trained language models. We compare the performance of a pretrained language model adapted for the social media domain and another that was further trained on multilingual sentiment analysis data. We found that the intermediate pre-training steps prior to fine-tuning on the target task yield performance gains. The best submissions by our team, NLP-CIC, achieved first and second place on the non-contextual Spanish (Subtask 1) and Mexican Spanish (Subtask 3) subtasks of the MeOffendEs-IberLEF 2021 shared task respectively.
AB - This paper presents an evaluation of intermediate pretraining for the task of offensive language identification. We leverage recent advances in multilingual contextual representation and fine-tuning of pre-trained language models. We compare the performance of a pretrained language model adapted for the social media domain and another that was further trained on multilingual sentiment analysis data. We found that the intermediate pre-training steps prior to fine-tuning on the target task yield performance gains. The best submissions by our team, NLP-CIC, achieved first and second place on the non-contextual Spanish (Subtask 1) and Mexican Spanish (Subtask 3) subtasks of the MeOffendEs-IberLEF 2021 shared task respectively.
KW - Mexican Spanish
KW - Offensive Language Identification
KW - Sentiment Analysis
KW - Social Media
KW - Spanish
KW - XLM-RoBERTa
UR - http://www.scopus.com/inward/record.url?scp=85115318768&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85115318768
SN - 1613-0073
VL - 2943
SP - 313
EP - 320
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2021 Iberian Languages Evaluation Forum, IberLEF 2021
Y2 - 21 September 2021
ER -