A convolutional neural network approach for gender and language variety identification

Helena Gómez-Adorno, Roddy Fuentes-Alba, Ilia Markov, Grigori Sidorov, Alexander Gelbukh

Resultado de la investigación: Contribución a una revistaArtículorevisión exhaustiva

4 Citas (Scopus)

Resumen

© 2019-IOS Press and the authors. We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm-support vector machines (SVM) trained on character n-grams (n = 3-8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol NE to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach.
Idioma originalInglés estadounidense
Páginas (desde-hasta)4845-4855
Número de páginas11
PublicaciónJournal of Intelligent and Fuzzy Systems
DOI
EstadoPublicada - 1 ene 2019

Huella

Profundice en los temas de investigación de 'A convolutional neural network approach for gender and language variety identification'. En conjunto forman una huella única.

Citar esto