TY - GEN
T1 - Exploratory Data Analysis for the Automatic Detection of Question Paraphrasing in Collaborative Environments
AU - Alcantara, Tania
AU - Calvo, Hiram
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Internet searches are a daily occurrence, but we must be aware that more than one person searches the same topic with different words, this is called paraphrasing. Paraphrasing involves syntactic changes and the overlapping of words, linked to the rules of the language in which we work. The identification is a problem of great importance for natural language processing (NLP), especially paraphrasing questions with the same intention. In addition, it has been found that for the study of similarities, some features are not taken into account, which makes the identification yield lower results. In this paper, we address the problem of automatic paraphrase identification in the Quora Question Pair (QQP) dataset, paying special attention to data’s shape through exploratory data analysis (EDA). This is in order to obtain better results in the identification tasks, as well as to compare different classifiers in collaborative environments where resources are limited.
AB - Internet searches are a daily occurrence, but we must be aware that more than one person searches the same topic with different words, this is called paraphrasing. Paraphrasing involves syntactic changes and the overlapping of words, linked to the rules of the language in which we work. The identification is a problem of great importance for natural language processing (NLP), especially paraphrasing questions with the same intention. In addition, it has been found that for the study of similarities, some features are not taken into account, which makes the identification yield lower results. In this paper, we address the problem of automatic paraphrase identification in the Quora Question Pair (QQP) dataset, paying special attention to data’s shape through exploratory data analysis (EDA). This is in order to obtain better results in the identification tasks, as well as to compare different classifiers in collaborative environments where resources are limited.
UR - http://www.scopus.com/inward/record.url?scp=85142810457&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19496-2_15
DO - 10.1007/978-3-031-19496-2_15
M3 - Contribución a la conferencia
AN - SCOPUS:85142810457
SN - 9783031194955
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 193
EP - 211
BT - Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings
A2 - Pichardo Lagunas, Obdulia
A2 - Martínez Seis, Bella
A2 - Martínez-Miranda, Juan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022
Y2 - 24 October 2022 through 29 October 2022
ER -