Paraphrase Identification: Lightweight Effective Methods Based Features from Pre-trained Models

Abu Bakar Siddiqur Rahman, Hoang Thang Ta, Lotfollah Najjar, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we work on Paraphrase Identification in Mexican Spanish (PAR-MEX) at the sentence level. We introduced two lightweight methods, linear regression and multilayer perceptron for training data on features, extracted from pre-trained models. A rule of thumb, pair similarity is used to filter noises in the positive examples. We obtained the best F1 of 88.67%, which points out the effectiveness of traditional methods with the support of pre-trained models. In the challenge, our result ranked fourth in the organizers' result table.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3202
StatePublished - 2022
Event2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain
Duration: 20 Sep 2022 → …

Keywords

  • IberLEF
  • Linear Regression
  • MultiLayer Perceptron
  • PAR-MEX
  • Paraphrase Identification
  • Text Classification

Fingerprint

Dive into the research topics of 'Paraphrase Identification: Lightweight Effective Methods Based Features from Pre-trained Models'. Together they form a unique fingerprint.

Cite this