Example of application of n-grams: Authorship attribution using syllables

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations

Abstract

As we described in the previous chapters, mainstream of the modern computational linguistics is based on application of machine learning methods. We represent our task as a classification task, represent our objects formally using features and their values (constructing vector space model), and then apply well-known classification algorithms. In this pipeline, the crucial question is how to select the features. For example, we can use as features words or n-grams of words (sequences of words) or sequences of characters (character n-grams), etc. An interesting question arises: Can we use syllables as features? It is very rarely done in computational linguistics, but there is certain linguistic reality behind syllables. This chapter explores this possibility for the authorship attribution task; it follows our research paper [99]. Note that syllables are somewhat similar to character n-grams in the sense that they are composed of several characters (being not too long).

Original languageEnglish
Title of host publicationSpringerBriefs in Computer Science
PublisherSpringer
Pages27-39
Number of pages13
DOIs
StatePublished - 2019

Publication series

NameSpringerBriefs in Computer Science
ISSN (Print)2191-5768
ISSN (Electronic)2191-5776

Fingerprint

Dive into the research topics of 'Example of application of n-grams: Authorship attribution using syllables'. Together they form a unique fingerprint.

Cite this