TY - JOUR
T1 - Mexican Spanish Paraphrase Identification using Data Augmentation
AU - Meque, Abdul
AU - Balouchzahi, Fazlourrahman
AU - Sidorov, Grigori
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2022
Y1 - 2022
N2 - Reorganizing words in a passage using synonyms and different words without changing the main message delivered in the original sentence is called paraphrasing. Simplifying, clarification or taking quotes, etc. In this paper, we address a Paraphrase Identification model for Mexican Spanish text pairs. A data augmentation step was done using Google Translate API, and then three different similarity algorithms, namely: Jaccard, Cosine, and Spacy similarity were used to create a similarity vector for each text pair. The paraphrase identification task was modeled as binary classification of text pairs into two classes, namely: Paraphrases and Not-Paraphrases. The proposed methodology with voting classifier of three machine learning classifiers obtained a F1-score of 0.8754 for paraphrases category.
AB - Reorganizing words in a passage using synonyms and different words without changing the main message delivered in the original sentence is called paraphrasing. Simplifying, clarification or taking quotes, etc. In this paper, we address a Paraphrase Identification model for Mexican Spanish text pairs. A data augmentation step was done using Google Translate API, and then three different similarity algorithms, namely: Jaccard, Cosine, and Spacy similarity were used to create a similarity vector for each text pair. The paraphrase identification task was modeled as binary classification of text pairs into two classes, namely: Paraphrases and Not-Paraphrases. The proposed methodology with voting classifier of three machine learning classifiers obtained a F1-score of 0.8754 for paraphrases category.
KW - Data Augmentation
KW - Paraphrase
KW - Similarity
KW - Spanish
UR - http://www.scopus.com/inward/record.url?scp=85137322604&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85137322604
SN - 1613-0073
VL - 3202
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2022 Iberian Languages Evaluation Forum, IberLEF 2022
Y2 - 20 September 2022
ER -