Improving cross-topic authorship attribution: The role of pre-processing

Ilia Markov, Efstathios Stamatatos, Grigori Sidorov

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

15 Citas (Scopus)

Resumen

The effectiveness of character n-gram features for representing the stylistic properties of a text has been demonstrated in various independent Authorship Attribution (AA) studies. Moreover, it has been shown that some categories of character n-grams perform better than others both under single and cross-topic AA conditions. In this work, we present an improved algorithm for cross-topic AA. We demonstrate that the effectiveness of character n-grams representation can be significantly enhanced by performing simple pre-processing steps and appropriately tuning the number of features, especially in cross-topic conditions.

Idioma originalInglés
Título de la publicación alojadaComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditoresAlexander Gelbukh
EditorialSpringer Verlag
Páginas289-302
Número de páginas14
ISBN (versión impresa)9783319771151
DOI
EstadoPublicada - 2018
Evento18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungría
Duración: 17 abr. 201723 abr. 2017

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen10762 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
País/TerritorioHungría
CiudadBudapest
Período17/04/1723/04/17

Huella

Profundice en los temas de investigación de 'Improving cross-topic authorship attribution: The role of pre-processing'. En conjunto forman una huella única.

Citar esto