Dynamically adjustable approach through obfuscation type recognition

Miguel A. Sanchez-Perez, Alexander Gelbukh, Grigori Sidorov

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2015. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme and cosine and dice similarity measures. We used and extended our previous algorithm for clustering and introduced a new verbatim detection method and extended the decision making regarding which approach or output to use. We improve significantly the performance regarding our previous PAN 2014 approach and hence, our approach outperforms the best-performing system of the PAN 2014. Our system is available open source.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen1391
EstadoPublicada - 2015
Evento16th Conference and Labs of the Evaluation Forum, CLEF 2015 - Toulouse, Francia
Duración: 8 sep. 201511 sep. 2015

Huella

Profundice en los temas de investigación de 'Dynamically adjustable approach through obfuscation type recognition'. En conjunto forman una huella única.

Citar esto