Análisis comparativo del Índice Entrópico en Bases de Datos Utilizando la Entropía de Tsallis y Renyi en Árboles de clasificación C4.5

Jazmín S. De la Cruz-García; Aldo  Ramírez Arellano

Análisis comparativo del Índice Entrópico en Bases de Datos Utilizando la Entropía de Tsallis y Renyi en Árboles de clasificación C4.5

Jazmín S. De la Cruz-García, Aldo Ramírez Arellano

Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas (UPIICSA)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

Resumen

Machine learning gives to the systems the ability to learn from experience. This is achieved throughthe generation of machine learning models. One of the most widely used models is supervisedlearning, which employs classification models that allow a computer program to learn from theinput data to obtain classifications. Input and output data are labelled for classification, providinga learning base for future data processing. The C4.5 algorithm is used to obtain classificationmodels (from a database), called decision trees. This algorithm uses the concept of entropy definedby Shannon to calculate the gain ratio. In this study Tsallis and Renyi entropies (instead ofShannon) are used to construct a decision tree. In previous works, these entropies have shownbetter results than Shannon. These entropies have an additional parameter q that is used to affectthe probability distributions. This research focuses on developing a method that obtains the valueof q that will be applied to compute the information gain ratio in the C4.5 algorithm using Tsallisand Renyi entropies. The method obtains a network representation of the database; then, the box-covering algorithm is computed to obtain the minimum number of boxes to cover the network. Thecalculation of the parameter q will depend on the minimum network coverage. https://www.researchgate.net/publication/365686804_Analisis_comparativo_del_Indice_Entropico_en_Bases_de_Datos_Utilizando_la_Entropia_de_Tsallis_y_Renyi_en_Arboles_de_clasificacion_C45 [accessed Jan 21 2023].

Título traducido de la contribución	Comparative Analysis of Entropic Index in Databases Using Tsallis and Renyi Entropy in C4.5 Classification Trees
Idioma original	Español
Publicación	Revista Internacional de Investigación e Innovación Tecnológica
Volumen	10
N.º	59
Estado	Publicada - 1 nov. 2022

Citar esto

@article{c51d287228c74c68a44ca18c974ae64d,

title = "An{\'a}lisis comparativo del {\'I}ndice Entr{\'o}pico en Bases de Datos Utilizando la Entrop{\'i}a de Tsallis y Renyi en {\'A}rboles de clasificaci{\'o}n C4.5",

abstract = "Machine learning gives to the systems the ability to learn from experience. This is achieved throughthe generation of machine learning models. One of the most widely used models is supervisedlearning, which employs classification models that allow a computer program to learn from theinput data to obtain classifications. Input and output data are labelled for classification, providinga learning base for future data processing. The C4.5 algorithm is used to obtain classificationmodels (from a database), called decision trees. This algorithm uses the concept of entropy definedby Shannon to calculate the gain ratio. In this study Tsallis and Renyi entropies (instead ofShannon) are used to construct a decision tree. In previous works, these entropies have shownbetter results than Shannon. These entropies have an additional parameter q that is used to affectthe probability distributions. This research focuses on developing a method that obtains the valueof q that will be applied to compute the information gain ratio in the C4.5 algorithm using Tsallisand Renyi entropies. The method obtains a network representation of the database; then, the box-covering algorithm is computed to obtain the minimum number of boxes to cover the network. Thecalculation of the parameter q will depend on the minimum network coverage. https://www.researchgate.net/publication/365686804_Analisis_comparativo_del_Indice_Entropico_en_Bases_de_Datos_Utilizando_la_Entropia_de_Tsallis_y_Renyi_en_Arboles_de_clasificacion_C45 [accessed Jan 21 2023].",

author = "{De la Cruz-Garc{\'i}a}, {Jazm{\'i}n S.} and {Ram{\'i}rez Arellano}, Aldo",

year = "2022",

month = nov,

day = "1",

language = "Espa{\~n}ol",

volume = "10",

journal = "Revista Internacional de Investigaci{\'o}n e Innovaci{\'o}n Tecnol{\'o}gica",

issn = "2007-9753",

number = "59",

}

Análisis comparativo del Índice Entrópico en Bases de Datos Utilizando la Entropía de Tsallis y Renyi en Árboles de clasificación C4.5. / De la Cruz-García, Jazmín S.; Ramírez Arellano , Aldo .
En: Revista Internacional de Investigación e Innovación Tecnológica, Vol. 10, N.º 59, 01.11.2022.

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

TY - JOUR

T1 - Análisis comparativo del Índice Entrópico en Bases de Datos Utilizando la Entropía de Tsallis y Renyi en Árboles de clasificación C4.5

AU - De la Cruz-García, Jazmín S.

AU - Ramírez Arellano , Aldo

PY - 2022/11/1

Y1 - 2022/11/1

N2 - Machine learning gives to the systems the ability to learn from experience. This is achieved throughthe generation of machine learning models. One of the most widely used models is supervisedlearning, which employs classification models that allow a computer program to learn from theinput data to obtain classifications. Input and output data are labelled for classification, providinga learning base for future data processing. The C4.5 algorithm is used to obtain classificationmodels (from a database), called decision trees. This algorithm uses the concept of entropy definedby Shannon to calculate the gain ratio. In this study Tsallis and Renyi entropies (instead ofShannon) are used to construct a decision tree. In previous works, these entropies have shownbetter results than Shannon. These entropies have an additional parameter q that is used to affectthe probability distributions. This research focuses on developing a method that obtains the valueof q that will be applied to compute the information gain ratio in the C4.5 algorithm using Tsallisand Renyi entropies. The method obtains a network representation of the database; then, the box-covering algorithm is computed to obtain the minimum number of boxes to cover the network. Thecalculation of the parameter q will depend on the minimum network coverage. https://www.researchgate.net/publication/365686804_Analisis_comparativo_del_Indice_Entropico_en_Bases_de_Datos_Utilizando_la_Entropia_de_Tsallis_y_Renyi_en_Arboles_de_clasificacion_C45 [accessed Jan 21 2023].

AB - Machine learning gives to the systems the ability to learn from experience. This is achieved throughthe generation of machine learning models. One of the most widely used models is supervisedlearning, which employs classification models that allow a computer program to learn from theinput data to obtain classifications. Input and output data are labelled for classification, providinga learning base for future data processing. The C4.5 algorithm is used to obtain classificationmodels (from a database), called decision trees. This algorithm uses the concept of entropy definedby Shannon to calculate the gain ratio. In this study Tsallis and Renyi entropies (instead ofShannon) are used to construct a decision tree. In previous works, these entropies have shownbetter results than Shannon. These entropies have an additional parameter q that is used to affectthe probability distributions. This research focuses on developing a method that obtains the valueof q that will be applied to compute the information gain ratio in the C4.5 algorithm using Tsallisand Renyi entropies. The method obtains a network representation of the database; then, the box-covering algorithm is computed to obtain the minimum number of boxes to cover the network. Thecalculation of the parameter q will depend on the minimum network coverage. https://www.researchgate.net/publication/365686804_Analisis_comparativo_del_Indice_Entropico_en_Bases_de_Datos_Utilizando_la_Entropia_de_Tsallis_y_Renyi_en_Arboles_de_clasificacion_C45 [accessed Jan 21 2023].

M3 - Artículo

SN - 2007-9753

VL - 10

JO - Revista Internacional de Investigación e Innovación Tecnológica

JF - Revista Internacional de Investigación e Innovación Tecnológica

IS - 59

ER -

Análisis comparativo del Índice Entrópico en Bases de Datos Utilizando la Entropía de Tsallis y Renyi en Árboles de clasificación C4.5

Resumen

Huella

Citar esto