TY - GEN
T1 - Adaptive algorithm for plagiarism detection
T2 - 6th International Conference on Labs of the Evaluation Forum, CLEF 2015
AU - Sanchez-Perez, Miguel A.
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best performing system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source.
AB - The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best performing system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source.
UR - http://www.scopus.com/inward/record.url?scp=84945926770&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-24027-5_42
DO - 10.1007/978-3-319-24027-5_42
M3 - Contribución a la conferencia
SN - 9783319240268
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 402
EP - 413
BT - Experimental IR Meets Multilinguality, Multimodality, and Interaction - 6th International Conference of the CLEF Association, CLEF 2015, Proceedings
A2 - Juan, Eric San
A2 - Savoy, Jacques
A2 - Mothe, Josiane
A2 - Kamps, Jaap
A2 - Jones, Gareth J.F.
A2 - Ferro, Nicola
A2 - Pinel-Sauvagnat, Karen
A2 - Cappellato, Linda
PB - Springer Verlag
Y2 - 8 September 2015 through 11 September 2015
ER -