Author identification using latent dirichlet allocation

Hiram Calvo, Ángel Hernández-Castañeda, Jorge García-Flores

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.

Idioma originalInglés
Título de la publicación alojadaComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditoresAlexander Gelbukh
EditorialSpringer Verlag
Páginas303-312
Número de páginas10
ISBN (versión impresa)9783319771151
DOI
EstadoPublicada - 2018
Evento18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungría
Duración: 17 abr. 201723 abr. 2017

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen10762 LNCS
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
País/TerritorioHungría
CiudadBudapest
Período17/04/1723/04/17

Huella

Profundice en los temas de investigación de 'Author identification using latent dirichlet allocation'. En conjunto forman una huella única.

Citar esto