Statistical Entropy Measures in C4.5 Trees

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalInternational Journal of Data Warehousing and Mining
Volume14
Issue number1
DOIs
StatePublished - 1 Jan 2018

Keywords

  • Classification
  • Data Mining
  • Decision Trees
  • Entropy Measures
  • Information Theory

Fingerprint

Dive into the research topics of 'Statistical Entropy Measures in C4.5 Trees'. Together they form a unique fingerprint.

Cite this