Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh

Producción científica: Contribución a una conferenciaArtículo

1 Cita (Scopus)

Resumen

© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.
Idioma originalInglés estadounidense
EstadoPublicada - 1 ene. 2019
EventoCEUR Workshop Proceedings -
Duración: 1 ene. 2019 → …

Conferencia

ConferenciaCEUR Workshop Proceedings
Período1/01/19 → …

Huella

Profundice en los temas de investigación de 'Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019'. En conjunto forman una huella única.

Citar esto