Adaptive algorithm for plagiarism detection: The best-performing approach at PAN 2014 text alignment competition

Miguel A. Sanchez-Perez, Alexander Gelbukh, Grigori Sidorov

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

17 Citas (Scopus)

Resumen

The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best performing system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source.

Idioma originalInglés
Título de la publicación alojadaExperimental IR Meets Multilinguality, Multimodality, and Interaction - 6th International Conference of the CLEF Association, CLEF 2015, Proceedings
EditoresEric San Juan, Jacques Savoy, Josiane Mothe, Jaap Kamps, Gareth J.F. Jones, Nicola Ferro, Karen Pinel-Sauvagnat, Linda Cappellato
EditorialSpringer Verlag
Páginas402-413
Número de páginas12
ISBN (versión impresa)9783319240268
DOI
EstadoPublicada - 2015
Evento6th International Conference on Labs of the Evaluation Forum, CLEF 2015 - Toulouse, Francia
Duración: 8 sep. 201511 sep. 2015

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen9283
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia6th International Conference on Labs of the Evaluation Forum, CLEF 2015
País/TerritorioFrancia
CiudadToulouse
Período8/09/1511/09/15

Huella

Profundice en los temas de investigación de 'Adaptive algorithm for plagiarism detection: The best-performing approach at PAN 2014 text alignment competition'. En conjunto forman una huella única.

Citar esto