GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification

Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hidden layer of transformers. To improve the model performance, a rule of thumb based on the pair similarity is used to remove possible wrong sentence pairs in positive examples; parallel with the addition of unlabelled data in the same domain. The best obtained F1 is 90.22%, ranked third in the final result table, also outperformed the organizers' baseline.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3202
StatePublished - 2022
Event2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain
Duration: 20 Sep 2022 → …

Keywords

  • GAN-BERT
  • IberLEF
  • PAR-MEX
  • Paraphrase Identification
  • Text Classification

Cite this