TY - JOUR
T1 - CIC-GIL approach to cross-domain authorship attribution
T2 - 19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018
AU - Martín-Del-Campo-Rodríguez, Carolina
AU - Gómez-Adorno, Helena
AU - Sidorov, Grigori
AU - Batyrshin, Ildar
N1 - Funding Information:
This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20181849, 20171813) and Honeywell Grant.
PY - 2018
Y1 - 2018
N2 - We present the CIC-GIL approach to the cross-domain authorship attribution task at PAN 2018. This year's evaluation lab focuses on the closed-set attribution task applied to a Fanflction corpus in five languages: English, French, Italian, Polish, and Spanish. We followed a traditional machine learning approach and selected different feature sets depending on the language. We evaluated document features such as typed and untyped character n-grams, word n-grams, and function word n-grams. Our final system uses the log-entropy weighting scheme and SVM as classifier.
AB - We present the CIC-GIL approach to the cross-domain authorship attribution task at PAN 2018. This year's evaluation lab focuses on the closed-set attribution task applied to a Fanflction corpus in five languages: English, French, Italian, Polish, and Spanish. We followed a traditional machine learning approach and selected different feature sets depending on the language. We evaluated document features such as typed and untyped character n-grams, word n-grams, and function word n-grams. Our final system uses the log-entropy weighting scheme and SVM as classifier.
UR - http://www.scopus.com/inward/record.url?scp=85051074358&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85051074358
SN - 1613-0073
VL - 2125
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 10 September 2018 through 14 September 2018
ER -