Abstract
In this paper we present an authorship attribution method based on the use of complete (non-continuous, with bifurcations) syntactic n-grams as style markers. Syntactic n-grams are obtained by following paths in subtrees of a syntactic tree. We work with relatively short text fragments and build authors’ profiles of various sizes using tf-idf scheme. We train SVM classifier to perform the task. We compare the method with the application of character n-grams and show that the accuracy increases when using complete syntactic n-grams.
Original language | English |
---|---|
Pages (from-to) | 9-17 |
Number of pages | 9 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8856 |
DOIs | |
State | Published - 2014 |
Keywords
- Authorship attribution
- SVM
- Style markers
- Syntactic markers
- Syntactic n-grams
- Syntactic paths