Statistical analysis of various splitting criteria for decision trees

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Algorithms & Computational Technology Pub Date : 2023-01-01 DOI:10.1177/17483026231198181

Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman

{"title":"Statistical analysis of various splitting criteria for decision trees","authors":"Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman","doi":"10.1177/17483026231198181","DOIUrl":null,"url":null,"abstract":"Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.","PeriodicalId":45079,"journal":{"name":"Journal of Algorithms & Computational Technology","volume":"112 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Algorithms & Computational Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/17483026231198181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

决策树各种分裂准则的统计分析

决策树由于其清晰简单的体系结构、优良的质量和弹性等优点，经常被用于克服数据挖掘和机器学习领域的分类问题。根据自顶向下的划分策略，使用各种属性选择标准开发了各种决策树算法。但是，分割方法的选择会影响其有效性。因此，在这项工作中，为了比较它们的性能，我们收集了基于六种不同属性评价指标的六种决策树算法。要比较的决策树的选择是基于四种不同类别的分割标准完成的，这四种标准是基于信息论的标准、基于距离的标准、基于统计的标准和其他分割标准。这些方法包括迭代二分类器3(第一类)、C[公式:见文本](第一类)、分类和回归树(第二类)、基于Pearson相关系数的决策树(第三类)、离散比(第三类)和基于特征权重的决策树算法(最后一类)。在11个数据集上，从分类精度、树深度、叶节点和树构建时间等方面对这6种方法进行了评估。此外，弗里德曼和事后Nemenyi测试被用来检查所获得的结果。这两个测试的结果表明，迭代二分类器3和分类回归树决策树方法的性能优于其他决策树方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊