Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

1 Cita (Scopus)

Resumen

This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen2380
EstadoPublicada - 2019
Evento20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Suiza
Duración: 9 sep. 201912 sep. 2019

Huella

Profundice en los temas de investigación de 'Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019'. En conjunto forman una huella única.

Citar esto