Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh

Research output: Contribution to conferencePaper

1 Scopus citations

Abstract

© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.
Original languageAmerican English
StatePublished - 1 Jan 2019
EventCEUR Workshop Proceedings -
Duration: 1 Jan 2019 → …

Conference

ConferenceCEUR Workshop Proceedings
Period1/01/19 → …

Fingerprint

Dive into the research topics of 'Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019'. Together they form a unique fingerprint.

Cite this