TY - CHAP
T1 - Example of application of n-grams
T2 - Authorship attribution using syllables
AU - Sidorov, Grigori
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - As we described in the previous chapters, mainstream of the modern computational linguistics is based on application of machine learning methods. We represent our task as a classification task, represent our objects formally using features and their values (constructing vector space model), and then apply well-known classification algorithms. In this pipeline, the crucial question is how to select the features. For example, we can use as features words or n-grams of words (sequences of words) or sequences of characters (character n-grams), etc. An interesting question arises: Can we use syllables as features? It is very rarely done in computational linguistics, but there is certain linguistic reality behind syllables. This chapter explores this possibility for the authorship attribution task; it follows our research paper [99]. Note that syllables are somewhat similar to character n-grams in the sense that they are composed of several characters (being not too long).
AB - As we described in the previous chapters, mainstream of the modern computational linguistics is based on application of machine learning methods. We represent our task as a classification task, represent our objects formally using features and their values (constructing vector space model), and then apply well-known classification algorithms. In this pipeline, the crucial question is how to select the features. For example, we can use as features words or n-grams of words (sequences of words) or sequences of characters (character n-grams), etc. An interesting question arises: Can we use syllables as features? It is very rarely done in computational linguistics, but there is certain linguistic reality behind syllables. This chapter explores this possibility for the authorship attribution task; it follows our research paper [99]. Note that syllables are somewhat similar to character n-grams in the sense that they are composed of several characters (being not too long).
UR - http://www.scopus.com/inward/record.url?scp=85064689807&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-14771-6_6
DO - 10.1007/978-3-030-14771-6_6
M3 - Capítulo
AN - SCOPUS:85064689807
T3 - SpringerBriefs in Computer Science
SP - 27
EP - 39
BT - SpringerBriefs in Computer Science
PB - Springer
ER -