Author profiling with doc2vec neural network-based document embeddings

Ilia Markov, Helena Gómez-Adorno, Juan Pablo Posadas-Durán, Grigori Sidorov, Alexander Gelbukh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Scopus citations

Abstract

To determine author demographics of texts in social media such as Twitter, blogs, and reviews, we use doc2vec document embeddings to train a logistic regression classifier. We experimented with age and gender identification on the PAN author profiling 2014–2016 corpora under both single- and cross-genre conditions. We show that under certain settings the neural network-based features outperform the traditional features when using the same classifier. Our method outperforms existing state of the art under some settings, though the current state-of-the-art results on those tasks have been quite weak.

Original languageEnglish
Title of host publicationAdvances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings
EditorsObdulia Pichardo-Lagunas, Sabino Miranda-Jimenez
PublisherSpringer Verlag
Pages117-131
Number of pages15
ISBN (Print)9783319624273
DOIs
StatePublished - 2017
Event15th Mexican International Conference on Artificial Intelligence, MICAI 2016 - Cancun, Mexico
Duration: 23 Oct 201628 Oct 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10062 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Mexican International Conference on Artificial Intelligence, MICAI 2016
Country/TerritoryMexico
CityCancun
Period23/10/1628/10/16

Keywords

  • Author profiling
  • Document embeddings
  • Machine learning
  • Neural networks
  • doc2vec

Fingerprint

Dive into the research topics of 'Author profiling with doc2vec neural network-based document embeddings'. Together they form a unique fingerprint.

Cite this