GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification

Hoang Thang Ta; Abu Bakar Siddiqur Rahman; Lotfollah Najjar; Alexander Gelbukh

GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification

Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hidden layer of transformers. To improve the model performance, a rule of thumb based on the pair similarity is used to remove possible wrong sentence pairs in positive examples; parallel with the addition of unlabelled data in the same domain. The best obtained F1 is 90.22%, ranked third in the final result table, also outperformed the organizers' baseline.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3202
State	Published - 2022
Event	2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain Duration: 20 Sep 2022 → …

Keywords

GAN-BERT
IberLEF
PAR-MEX
Paraphrase Identification
Text Classification

Cite this

@article{19f68d2cc35c48dd8a66cfec57a6a113,

title = "GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification",

abstract = "In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hidden layer of transformers. To improve the model performance, a rule of thumb based on the pair similarity is used to remove possible wrong sentence pairs in positive examples; parallel with the addition of unlabelled data in the same domain. The best obtained F1 is 90.22%, ranked third in the final result table, also outperformed the organizers' baseline.",

keywords = "GAN-BERT, IberLEF, PAR-MEX, Paraphrase Identification, Text Classification",

author = "Ta, {Hoang Thang} and Rahman, {Abu Bakar Siddiqur} and Lotfollah Najjar and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).; 2022 Iberian Languages Evaluation Forum, IberLEF 2022 ; Conference date: 20-09-2022",

year = "2022",

language = "Ingl{\'e}s",

volume = "3202",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification

AU - Ta, Hoang Thang

AU - Rahman, Abu Bakar Siddiqur

AU - Najjar, Lotfollah

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hidden layer of transformers. To improve the model performance, a rule of thumb based on the pair similarity is used to remove possible wrong sentence pairs in positive examples; parallel with the addition of unlabelled data in the same domain. The best obtained F1 is 90.22%, ranked third in the final result table, also outperformed the organizers' baseline.

AB - In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hidden layer of transformers. To improve the model performance, a rule of thumb based on the pair similarity is used to remove possible wrong sentence pairs in positive examples; parallel with the addition of unlabelled data in the same domain. The best obtained F1 is 90.22%, ranked third in the final result table, also outperformed the organizers' baseline.

KW - GAN-BERT

KW - IberLEF

KW - PAR-MEX

KW - Paraphrase Identification

KW - Text Classification

UR - http://www.scopus.com/inward/record.url?scp=85137346387&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85137346387

SN - 1613-0073

VL - 3202

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2022 Iberian Languages Evaluation Forum, IberLEF 2022

Y2 - 20 September 2022

ER -

GAN-BERT, an Adversarial Learning Architecture for Paraphrase Identification

Abstract

Keywords

Other files and links

Cite this