NLP-CIC at HASOC 2020: Multilingual offensive language detection using all-in-one model

Segun Taofeek Aroyehun; Alexander Gelbukh

NLP-CIC at HASOC 2020: Multilingual offensive language detection using all-in-one model

Segun Taofeek Aroyehun, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

We describe our deep learning model submitted to the HASOC 2020 shared task on detection of offensive language in social media in three Indo-European languages: English, German, and Hindi. We fine-tune a pre-trained multilingual encoder on the combination of data provided for the competition. Our submission received a competitive macro- average F1 score of 0.4980 on the English Subtask A as well as comparatively strong performance on the German data.

Original language	English
Pages (from-to)	331-335
Number of pages	5
Journal	CEUR Workshop Proceedings
Volume	2826
State	Published - 2020
Event	Working Notes of FIRE - 12th Forum for Information Retrieval Evaluation, FIRE-WN 2020 - Hyderabad, India Duration: 16 Dec 2020 → 20 Dec 2020

Keywords

Deep learning
Multilingual
Offensive content identification
Text classification

Cite this

@article{8e65337e31594ae8b72a7ca7760afdfc,

title = "NLP-CIC at HASOC 2020: Multilingual offensive language detection using all-in-one model",

abstract = "We describe our deep learning model submitted to the HASOC 2020 shared task on detection of offensive language in social media in three Indo-European languages: English, German, and Hindi. We fine-tune a pre-trained multilingual encoder on the combination of data provided for the competition. Our submission received a competitive macro- average F1 score of 0.4980 on the English Subtask A as well as comparatively strong performance on the German data.",

keywords = "Deep learning, Multilingual, Offensive content identification, Text classification",

author = "Aroyehun, {Segun Taofeek} and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2020 Copyright for this paper by its authors.; Working Notes of FIRE - 12th Forum for Information Retrieval Evaluation, FIRE-WN 2020 ; Conference date: 16-12-2020 Through 20-12-2020",

year = "2020",

language = "Ingl{\'e}s",

volume = "2826",

pages = "331--335",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - NLP-CIC at HASOC 2020

T2 - Working Notes of FIRE - 12th Forum for Information Retrieval Evaluation, FIRE-WN 2020

AU - Aroyehun, Segun Taofeek

AU - Gelbukh, Alexander

PY - 2020

Y1 - 2020

N2 - We describe our deep learning model submitted to the HASOC 2020 shared task on detection of offensive language in social media in three Indo-European languages: English, German, and Hindi. We fine-tune a pre-trained multilingual encoder on the combination of data provided for the competition. Our submission received a competitive macro- average F1 score of 0.4980 on the English Subtask A as well as comparatively strong performance on the German data.

AB - We describe our deep learning model submitted to the HASOC 2020 shared task on detection of offensive language in social media in three Indo-European languages: English, German, and Hindi. We fine-tune a pre-trained multilingual encoder on the combination of data provided for the competition. Our submission received a competitive macro- average F1 score of 0.4980 on the English Subtask A as well as comparatively strong performance on the German data.

KW - Deep learning

KW - Multilingual

KW - Offensive content identification

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85102960037&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85102960037

SN - 1613-0073

VL - 2826

SP - 331

EP - 335

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 16 December 2020 through 20 December 2020

ER -

NLP-CIC at HASOC 2020: Multilingual offensive language detection using all-in-one model

Abstract

Keywords

Other files and links

Fingerprint

Cite this