Enhancement of performance of document clustering in the authorship identification problem with a weighted cosine similarity

Carolina Martín-del-Campo-Rodríguez, Grigori Sidorov, Ildar Batyrshin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Distance and similarity measures are essential to solve many pattern recognition problems such as classification, information retrieval and clustering, where the use of a specific distance could led to a better performance than others. A weighted cosine distance is proposed considering a variation in the weights of exclusive attributes of the input vectors. An agglomerative hierarchical clustering of documents was used for the comparison between the traditional cosine similarity and the one proposed in this paper. This modified measure has outcome in an improvement in the formation of clusters.

Original languageEnglish
Title of host publicationAdvances in Computational Intelligence - 17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Proceedings
EditorsIldar Batyrshin, María de Lourdes Martínez-Villaseñor, Hiram Eredín Ponce Espinosa
PublisherSpringer Verlag
Pages49-56
Number of pages8
ISBN (Print)9783030044961
DOIs
StatePublished - 2018
Event17th Mexican International Conference on Artificial Intelligence, MICAI 2018 - Guadalajara, Mexico
Duration: 22 Oct 201827 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11289 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Mexican International Conference on Artificial Intelligence, MICAI 2018
Country/TerritoryMexico
CityGuadalajara
Period22/10/1827/10/18

Fingerprint

Dive into the research topics of 'Enhancement of performance of document clustering in the authorship identification problem with a weighted cosine similarity'. Together they form a unique fingerprint.

Cite this