Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez; Daniel Alejandro Pérez Alvarez; Christian Efraín Maldonado Sifuentes; Grigori Sidorov; Ildar Batyrshin; Alexander Gelbukh

Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

1 Cita (Scopus)

Resumen

This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

Idioma original	Inglés
Publicación	CEUR Workshop Proceedings
Volumen	2380
Estado	Publicada - 2019
Evento	20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Suiza Duración: 9 sep. 2019 → 12 sep. 2019

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{7311a781046d48739fb274d7085083a7,

title = "Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019",

abstract = "This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.",

author = "Carolina Mart{\'i}n-Del-Campo-Rodr{\'i}guez and {P{\'e}rez Alvarez}, {Daniel Alejandro} and {Maldonado Sifuentes}, {Christian Efra{\'i}n} and Grigori Sidorov and Ildar Batyrshin and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland.; 20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 ; Conference date: 09-09-2019 Through 12-09-2019",

year = "2019",

language = "Ingl{\'e}s",

volume = "2380",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019. / Martín-Del-Campo-Rodríguez, Carolina; Pérez Alvarez, Daniel Alejandro; Maldonado Sifuentes, Christian Efraín et al.
En: CEUR Workshop Proceedings, Vol. 2380, 2019.

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

TY - JOUR

T1 - Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

AU - Martín-Del-Campo-Rodríguez, Carolina

AU - Pérez Alvarez, Daniel Alejandro

AU - Maldonado Sifuentes, Christian Efraín

AU - Sidorov, Grigori

AU - Batyrshin, Ildar

AU - Gelbukh, Alexander

N1 - Publisher Copyright: © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland.

PY - 2019

Y1 - 2019

N2 - This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

AB - This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

UR - http://www.scopus.com/inward/record.url?scp=85070524753&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85070524753

SN - 1613-0073

VL - 2380

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019

Y2 - 9 September 2019 through 12 September 2019

ER -

Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Resumen

Otros archivos y enlaces

Huella

Citar esto