Paraphrase Identification: Lightweight Effective Methods Based Features from Pre-trained Models

Abu Bakar Siddiqur Rahman, Hoang Thang Ta, Lotfollah Najjar, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

Resumen

In this paper, we work on Paraphrase Identification in Mexican Spanish (PAR-MEX) at the sentence level. We introduced two lightweight methods, linear regression and multilayer perceptron for training data on features, extracted from pre-trained models. A rule of thumb, pair similarity is used to filter noises in the positive examples. We obtained the best F1 of 88.67%, which points out the effectiveness of traditional methods with the support of pre-trained models. In the challenge, our result ranked fourth in the organizers' result table.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen3202
EstadoPublicada - 2022
Evento2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Espana
Duración: 20 sep. 2022 → …

Huella

Profundice en los temas de investigación de 'Paraphrase Identification: Lightweight Effective Methods Based Features from Pre-trained Models'. En conjunto forman una huella única.

Citar esto