Complete syntactic N-grams as style markers for authorship attribution

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

In this paper we present an authorship attribution method based on the use of complete (non-continuous, with bifurcations) syntactic n-grams as style markers. Syntactic n-grams are obtained by following paths in subtrees of a syntactic tree. We work with relatively short text fragments and build authors’ profiles of various sizes using tf-idf scheme. We train SVM classifier to perform the task. We compare the method with the application of character n-grams and show that the accuracy increases when using complete syntactic n-grams.

Keywords

  • Authorship attribution
  • SVM
  • Style markers
  • Syntactic markers
  • Syntactic n-grams
  • Syntactic paths

Fingerprint

Dive into the research topics of 'Complete syntactic N-grams as style markers for authorship attribution'. Together they form a unique fingerprint.

Cite this