TY - JOUR
T1 - Dynamically adjustable approach through obfuscation type recognition
AU - Sanchez-Perez, Miguel A.
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
N1 - Funding Information:
Work done under partial support of FP7-PEOPLE-2010-IRSES: Web Information Quality - Evaluation Initiative (WIQ-EI) European Commission project 269180, Government of Mexico (SNI, CONACYT), and Instituto Politécnico Nacional, Mexico (SIP 20144274, 20150028, BEIFI, COFAA).
PY - 2015
Y1 - 2015
N2 - The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2015. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme and cosine and dice similarity measures. We used and extended our previous algorithm for clustering and introduced a new verbatim detection method and extended the decision making regarding which approach or output to use. We improve significantly the performance regarding our previous PAN 2014 approach and hence, our approach outperforms the best-performing system of the PAN 2014. Our system is available open source.
AB - The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2015. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme and cosine and dice similarity measures. We used and extended our previous algorithm for clustering and introduced a new verbatim detection method and extended the decision making regarding which approach or output to use. We improve significantly the performance regarding our previous PAN 2014 approach and hence, our approach outperforms the best-performing system of the PAN 2014. Our system is available open source.
UR - http://www.scopus.com/inward/record.url?scp=84982830543&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:84982830543
SN - 1613-0073
VL - 1391
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 16th Conference and Labs of the Evaluation Forum, CLEF 2015
Y2 - 8 September 2015 through 11 September 2015
ER -