Mexican Spanish Paraphrase Identification using Data Augmentation

Abdul Meque, Fazlourrahman Balouchzahi, Grigori Sidorov, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

Resumen

Reorganizing words in a passage using synonyms and different words without changing the main message delivered in the original sentence is called paraphrasing. Simplifying, clarification or taking quotes, etc. In this paper, we address a Paraphrase Identification model for Mexican Spanish text pairs. A data augmentation step was done using Google Translate API, and then three different similarity algorithms, namely: Jaccard, Cosine, and Spacy similarity were used to create a similarity vector for each text pair. The paraphrase identification task was modeled as binary classification of text pairs into two classes, namely: Paraphrases and Not-Paraphrases. The proposed methodology with voting classifier of three machine learning classifiers obtained a F1-score of 0.8754 for paraphrases category.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen3202
EstadoPublicada - 2022
Evento2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Espana
Duración: 20 sep. 2022 → …

Huella

Profundice en los temas de investigación de 'Mexican Spanish Paraphrase Identification using Data Augmentation'. En conjunto forman una huella única.

Citar esto