Using soft similarity in multi-label classification for reuters-21578 corpus

Victor Carrera Trejo, Grigori Sidorov, Marco Moreno Ibarra, Sabino Miranda Jiménez, Rodrigo Cadena Martínez

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In classification tasks one of the main problems is to choose which features provide best results, i.e., Construct a vector space model. In this paper, we show how to complement traditional vector space model with the concept of soft similarity. We use the combination of the traditional tf-idf model with latent Dirichlet allocation applied in multi-label classification. We considered multi-label files of the Reuters-21578 corpus as study case. The methodology is evaluated using the multi-label algorithm Rakel1. We used the traditional tf-idf model as the baseline. We present the F1 measures for both models for various feature sets, preprocessing techniques and vector sizes. The new model obtains better results than the base line model.

Original languageEnglish
Title of host publicationProceedings of Special Session 2014 13th Mexican International Conference on Artificial Intelligence
Subtitle of host publicationAdvances in Artificial Intelligence, MICAI 2014
EditorsAlexander Gelbukh, Sofia N. Galicia-Haro, Felix Castro Espinoza
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3-8
Number of pages6
ISBN (Electronic)9781479999002
DOIs
StatePublished - 25 Aug 2015
Event13th Mexican International Conference on Artificial Intelligence, MICAI 2014 - Tuxtla Gutierrez, Mexico
Duration: 16 Nov 201422 Nov 2014

Publication series

NameProceedings of Special Session 2014 13th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence, MICAI 2014

Conference

Conference13th Mexican International Conference on Artificial Intelligence, MICAI 2014
Country/TerritoryMexico
CityTuxtla Gutierrez
Period16/11/1422/11/14

Keywords

  • Latent Dirichlet allocation
  • Multi-labeling
  • Reuters-21578
  • Semantics
  • Soft similarity
  • Tf-idf
  • Vector space model

Fingerprint

Dive into the research topics of 'Using soft similarity in multi-label classification for reuters-21578 corpus'. Together they form a unique fingerprint.

Cite this