Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez; Daniel Alejandro Pérez Alvarez; Christian Efraín Maldonado Sifuentes; Grigori Sidorov; Ildar Batyrshin; Alexander Gelbukh

Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Carolina Martín-Del-Campo-Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Batyrshin, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to conference › Paper

1 Scopus citations

Abstract

© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

Original language	American English
State	Published - 1 Jan 2019
Event	CEUR Workshop Proceedings - Duration: 1 Jan 2019 → …

Conference

Conference	CEUR Workshop Proceedings
Period	1/01/19 → …

Cite this

@conference{4f5f16fd50b7430abea7cb8b173e903e,

title = "Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019",

abstract = "{\textcopyright} 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.",

author = "Carolina Mart{\'i}n-Del-Campo-Rodr{\'i}guez and {P{\'e}rez Alvarez}, {Daniel Alejandro} and {Maldonado Sifuentes}, {Christian Efra{\'i}n} and Grigori Sidorov and Ildar Batyrshin and Alexander Gelbukh",

year = "2019",

month = jan,

day = "1",

language = "American English",

note = "CEUR Workshop Proceedings ; Conference date: 01-01-2019",

}

TY - CONF

T1 - Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

AU - Martín-Del-Campo-Rodríguez, Carolina

AU - Pérez Alvarez, Daniel Alejandro

AU - Maldonado Sifuentes, Christian Efraín

AU - Sidorov, Grigori

AU - Batyrshin, Ildar

AU - Gelbukh, Alexander

PY - 2019/1/1

Y1 - 2019/1/1

N2 - © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

AB - © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland. This work explores the exploitation of pre-processing, feature extraction and the averaged combination of Support Vector Machines (SVM) outputs for the open-set Cross-Domain Authorship Attribution task. The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams. Starting from different feature representations of a document, several SVM are trained to represent the probability of membership for a certain author to latter obtain an average of all the SVM results. This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85070524753&origin=inward

UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85070524753&origin=inward

M3 - Paper

T2 - CEUR Workshop Proceedings

Y2 - 1 January 2019

ER -

Authorship attribution through punctuation n-grams and averaged combination of SVM notebook for PAN at CLEF 2019

Abstract

Conference

Other files and links

Fingerprint

Cite this