Evaluation of intermediate pre-training for the detection of offensive language

Segun Taofeek Aroyehun, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

6 Citas (Scopus)

Resumen

This paper presents an evaluation of intermediate pretraining for the task of offensive language identification. We leverage recent advances in multilingual contextual representation and fine-tuning of pre-trained language models. We compare the performance of a pretrained language model adapted for the social media domain and another that was further trained on multilingual sentiment analysis data. We found that the intermediate pre-training steps prior to fine-tuning on the target task yield performance gains. The best submissions by our team, NLP-CIC, achieved first and second place on the non-contextual Spanish (Subtask 1) and Mexican Spanish (Subtask 3) subtasks of the MeOffendEs-IberLEF 2021 shared task respectively.

Idioma originalInglés
Páginas (desde-hasta)313-320
Número de páginas8
PublicaciónCEUR Workshop Proceedings
Volumen2943
EstadoPublicada - 2021
Evento2021 Iberian Languages Evaluation Forum, IberLEF 2021 - Virtual, Malaga, Espana
Duración: 21 sep. 2021 → …

Huella

Profundice en los temas de investigación de 'Evaluation of intermediate pre-training for the detection of offensive language'. En conjunto forman una huella única.

Citar esto