TY - JOUR
T1 - Dependency vs. constituent based syntactic N-grams in text similarity measures for paraphrase recognition
AU - Calvo, Hiram
AU - Segura-Olivares, Andrea
AU - García, Alejandro
PY - 2014/7/1
Y1 - 2014/7/1
N2 - Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this problem, several lexical, syntactic and semantic based techniques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntactic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures, and linear combination and a similarity matrix built in turn from syntactic ngrams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each approach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syntactic dependency n-grams, or syntactic constituent ngrams. We compare too both approaches with traditional n-grams and state-of-the-art systems.
AB - Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this problem, several lexical, syntactic and semantic based techniques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntactic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures, and linear combination and a similarity matrix built in turn from syntactic ngrams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each approach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syntactic dependency n-grams, or syntactic constituent ngrams. We compare too both approaches with traditional n-grams and state-of-the-art systems.
KW - Constituent analysis
KW - Dependency analysis
KW - Microsoft Research paraphrase corpus
KW - Paraphrase recognition
KW - Similarity measures
KW - Syntactic ngrams
UR - http://www.scopus.com/inward/record.url?scp=84907507271&partnerID=8YFLogxK
U2 - 10.13053/CyS-18-3-2044
DO - 10.13053/CyS-18-3-2044
M3 - Artículo
SN - 1405-5546
VL - 18
SP - 517
EP - 554
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 3
ER -