Using soft similarity in multi-label classification for reuters-21578 corpus

Victor Carrera Trejo, Grigori Sidorov, Marco Moreno Ibarra, Sabino Miranda Jiménez, Rodrigo Cadena Martínez

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

In classification tasks one of the main problems is to choose which features provide best results, i.e., Construct a vector space model. In this paper, we show how to complement traditional vector space model with the concept of soft similarity. We use the combination of the traditional tf-idf model with latent Dirichlet allocation applied in multi-label classification. We considered multi-label files of the Reuters-21578 corpus as study case. The methodology is evaluated using the multi-label algorithm Rakel1. We used the traditional tf-idf model as the baseline. We present the F1 measures for both models for various feature sets, preprocessing techniques and vector sizes. The new model obtains better results than the base line model.

Idioma originalInglés
Título de la publicación alojadaProceedings of Special Session 2014 13th Mexican International Conference on Artificial Intelligence
Subtítulo de la publicación alojadaAdvances in Artificial Intelligence, MICAI 2014
EditoresAlexander Gelbukh, Sofia N. Galicia-Haro, Felix Castro Espinoza
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas3-8
Número de páginas6
ISBN (versión digital)9781479999002
DOI
EstadoPublicada - 25 ago. 2015
Evento13th Mexican International Conference on Artificial Intelligence, MICAI 2014 - Tuxtla Gutierrez, México
Duración: 16 nov. 201422 nov. 2014

Serie de la publicación

NombreProceedings of Special Session 2014 13th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence, MICAI 2014

Conferencia

Conferencia13th Mexican International Conference on Artificial Intelligence, MICAI 2014
País/TerritorioMéxico
CiudadTuxtla Gutierrez
Período16/11/1422/11/14

Huella

Profundice en los temas de investigación de 'Using soft similarity in multi-label classification for reuters-21578 corpus'. En conjunto forman una huella única.

Citar esto