Authorship Link Retrieval Between Documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we propose a method for automatic author clustering called Document Authoring Link Retriever, DALIR. Documents are represented using Doc2Vec, experimenting with several parameters; afterwards, vectors are clustered (or linked together) using K-means and Hierarchical Agglomerative Clustering. We experimented with different vector representation sizes, different fixed number of clusters, and clustering methods. We evaluated our method on the author clustering task of PAN @ CLEF 2017. We used the BCubed F-score evaluation scheme of this task, being able to overcome some of the reported results from the first places of this challenge, although our method requires to manually establish a number of clusters a priori.

Original languageEnglish
Title of host publicationAdvances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings
EditorsLourdes Martínez-Villaseñor, Hiram Ponce, Oscar Herrera-Alcántara, Félix A. Castro-Espinoza
PublisherSpringer Science and Business Media Deutschland GmbH
Pages297-305
Number of pages9
ISBN (Print)9783030608866
DOIs
StatePublished - 2020
Event19th Mexican International Conference on Artificial Intelligence, MICAI 2020 - Mexico City, Mexico
Duration: 12 Oct 202017 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12469 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th Mexican International Conference on Artificial Intelligence, MICAI 2020
Country/TerritoryMexico
CityMexico City
Period12/10/2017/10/20

Keywords

  • Author profiling
  • Clustering
  • Computational linguistics
  • Style analysis

Fingerprint

Dive into the research topics of 'Authorship Link Retrieval Between Documents'. Together they form a unique fingerprint.

Cite this