We present the CIC-GIL approach to the cross-domain authorship attribution task at PAN 2018. This year's evaluation lab focuses on the closed-set attribution task applied to a Fanflction corpus in five languages: English, French, Italian, Polish, and Spanish. We followed a traditional machine learning approach and selected different feature sets depending on the language. We evaluated document features such as typed and untyped character n-grams, word n-grams, and function word n-grams. Our final system uses the log-entropy weighting scheme and SVM as classifier.
|Original language||American English|
|State||Published - 1 Jan 2018|
|Event||CEUR Workshop Proceedings - |
Duration: 1 Jan 2018 → …
|Conference||CEUR Workshop Proceedings|
|Period||1/01/18 → …|
Martín-Del-Campo-Rodríguez, C., Gómez-Adorno, H., Sidorov, G., & Batyrshin, I. (2018). CIC-GIL approach to cross-domain authorship attribution: Notebook for PAN at CLEF 2018. Paper presented at CEUR Workshop Proceedings, .