Improving cross-topic authorship attribution: The role of pre-processing

Ilia Markov, Efstathios Stamatatos, Grigori Sidorov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

The effectiveness of character n-gram features for representing the stylistic properties of a text has been demonstrated in various independent Authorship Attribution (AA) studies. Moreover, it has been shown that some categories of character n-grams perform better than others both under single and cross-topic AA conditions. In this work, we present an improved algorithm for cross-topic AA. We demonstrate that the effectiveness of character n-grams representation can be significantly enhanced by performing simple pre-processing steps and appropriately tuning the number of features, especially in cross-topic conditions.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages289-302
Number of pages14
ISBN (Print)9783319771151
DOIs
StatePublished - 2018
Event18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungary
Duration: 17 Apr 201723 Apr 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10762 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Country/TerritoryHungary
CityBudapest
Period17/04/1723/04/17

Keywords

  • Authorship attribution
  • Character n-grams
  • Cross-topic
  • Machine learning
  • Pre-processing

Fingerprint

Dive into the research topics of 'Improving cross-topic authorship attribution: The role of pre-processing'. Together they form a unique fingerprint.

Cite this