TY - GEN
T1 - Author identification using latent dirichlet allocation
AU - Calvo, Hiram
AU - Hernández-Castañeda, Ángel
AU - García-Flores, Jorge
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
AB - We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
UR - http://www.scopus.com/inward/record.url?scp=85055692861&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-77116-8_22
DO - 10.1007/978-3-319-77116-8_22
M3 - Contribución a la conferencia
SN - 9783319771151
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 303
EP - 312
BT - Computational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
A2 - Gelbukh, Alexander
PB - Springer Verlag
T2 - 18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Y2 - 17 April 2017 through 23 April 2017
ER -