TY - JOUR
T1 - Using Transformers on Noisy vs. Clean Data for Paraphrase Identification in Mexican Spanish
AU - Tamayo, Antonio
AU - Burgos, Diego A.
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2022
Y1 - 2022
N2 - Paraphrase identification is relevant for plagiarism detection, question answering, and machine translation among others. In this work, we report a transfer learning approach using transformers to tackle paraphrase identification on noisy vs. clean data in Spanish as our contribution to the PAR-MEX 2022 shared task. We carried out fine-tuning as well as hyperparameters tuning on BERTIN, a model pre-trained on the Spanish portion of a massive multilingual web corpus. We achieved the best performance in the competition (F1 = 0.94) by fine-tuning BERTIN on noisy data and using it to identify paraphrase on clean data.
AB - Paraphrase identification is relevant for plagiarism detection, question answering, and machine translation among others. In this work, we report a transfer learning approach using transformers to tackle paraphrase identification on noisy vs. clean data in Spanish as our contribution to the PAR-MEX 2022 shared task. We carried out fine-tuning as well as hyperparameters tuning on BERTIN, a model pre-trained on the Spanish portion of a massive multilingual web corpus. We achieved the best performance in the competition (F1 = 0.94) by fine-tuning BERTIN on noisy data and using it to identify paraphrase on clean data.
KW - Language models
KW - Paraphrase identification
KW - Transfer learning
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85137370281&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85137370281
SN - 1613-0073
VL - 3202
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2022 Iberian Languages Evaluation Forum, IberLEF 2022
Y2 - 20 September 2022
ER -