Sexism identification using bert and data augmentation - Exist2021

Sabur Butt, Noman Ashraf, Grigori Sidorov, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

13 Scopus citations


Sexism is defined as discrimination among females of all ages. We have seen a rise of sexism in social media platforms manifesting itself in many forms. The paper presents best performing machine learning and deep learning algorithms as well as BERT results on \sEXism Identi_cation in Social neTworks (EXIST 2021)" shared task. The task incorporates multilingual dataset containing both Spanish and English tweets. The multilingual nature of the dataset and inconsistencies of the social media text makes it a challenging problem. Considering these challenges the paper focuses on the pre-processing techniques and data augmentation to boost results on various machine learning and deep learning methods. We achieved an F1 score of 78.02% on the sexism identification task (task 1) and F1 score of 49.08% on the sexism categorization task (task 2).

Original languageEnglish
Pages (from-to)381-389
Number of pages9
JournalCEUR Workshop Proceedings
StatePublished - 2021
Event2021 Iberian Languages Evaluation Forum, IberLEF 2021 - Virtual, Malaga, Spain
Duration: 21 Sep 2021 → …


  • BERT
  • data augmentation
  • deep learning
  • machine learning
  • sexism detection


Dive into the research topics of 'Sexism identification using bert and data augmentation - Exist2021'. Together they form a unique fingerprint.

Cite this