Statistical Entropy Measures in C4.5 Trees

Aldo Ramirez Arellano; Juan Bory-Reyes; Luis Manuel Hernandez-Simon

doi:10.4018/IJDWM.2018010101

Statistical Entropy Measures in C4.5 Trees

Aldo Ramirez Arellano, Juan Bory-Reyes, Luis Manuel Hernandez-Simon

Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), Unidad Zacatenco

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.

Original language	English
Pages (from-to)	1-14
Number of pages	14
Journal	International Journal of Data Warehousing and Mining
Volume	14
Issue number	1
DOIs	https://doi.org/10.4018/IJDWM.2018010101
State	Published - 1 Jan 2018

Keywords

Classification
Data Mining
Decision Trees
Entropy Measures
Information Theory

Access to Document

10.4018/IJDWM.2018010101

Cite this

@article{5b1d3437a88d4578a6374de76dfa4fe2,

title = "Statistical Entropy Measures in C4.5 Trees",

abstract = "The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.",

keywords = "Classification, Data Mining, Decision Trees, Entropy Measures, Information Theory",

author = "Arellano, {Aldo Ramirez} and Juan Bory-Reyes and Hernandez-Simon, {Luis Manuel}",

note = "Publisher Copyright: Copyright {\textcopyright} 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.",

year = "2018",

month = jan,

day = "1",

doi = "10.4018/IJDWM.2018010101",

language = "Ingl{\'e}s",

volume = "14",

pages = "1--14",

journal = "International Journal of Data Warehousing and Mining",

issn = "1548-3924",

number = "1",

}

TY - JOUR

T1 - Statistical Entropy Measures in C4.5 Trees

AU - Arellano, Aldo Ramirez

AU - Bory-Reyes, Juan

AU - Hernandez-Simon, Luis Manuel

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.

AB - The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.

KW - Classification

KW - Data Mining

KW - Decision Trees

KW - Entropy Measures

KW - Information Theory

UR - http://www.scopus.com/inward/record.url?scp=85042133770&partnerID=8YFLogxK

U2 - 10.4018/IJDWM.2018010101

DO - 10.4018/IJDWM.2018010101

M3 - Artículo

SN - 1548-3924

VL - 14

SP - 1

EP - 14

JO - International Journal of Data Warehousing and Mining

JF - International Journal of Data Warehousing and Mining

IS - 1

ER -

Statistical Entropy Measures in C4.5 Trees

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this