决策树各种分裂准则的统计分析

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Algorithms & Computational Technology Pub Date : 2023-01-01 DOI:10.1177/17483026231198181
Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman
{"title":"决策树各种分裂准则的统计分析","authors":"Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman","doi":"10.1177/17483026231198181","DOIUrl":null,"url":null,"abstract":"Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.","PeriodicalId":45079,"journal":{"name":"Journal of Algorithms & Computational Technology","volume":"112 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical analysis of various splitting criteria for decision trees\",\"authors\":\"Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman\",\"doi\":\"10.1177/17483026231198181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.\",\"PeriodicalId\":45079,\"journal\":{\"name\":\"Journal of Algorithms & Computational Technology\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Algorithms & Computational Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/17483026231198181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Algorithms & Computational Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/17483026231198181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

决策树由于其清晰简单的体系结构、优良的质量和弹性等优点,经常被用于克服数据挖掘和机器学习领域的分类问题。根据自顶向下的划分策略,使用各种属性选择标准开发了各种决策树算法。但是,分割方法的选择会影响其有效性。因此,在这项工作中,为了比较它们的性能,我们收集了基于六种不同属性评价指标的六种决策树算法。要比较的决策树的选择是基于四种不同类别的分割标准完成的,这四种标准是基于信息论的标准、基于距离的标准、基于统计的标准和其他分割标准。这些方法包括迭代二分类器3(第一类)、C[公式:见文本](第一类)、分类和回归树(第二类)、基于Pearson相关系数的决策树(第三类)、离散比(第三类)和基于特征权重的决策树算法(最后一类)。在11个数据集上,从分类精度、树深度、叶节点和树构建时间等方面对这6种方法进行了评估。此外,弗里德曼和事后Nemenyi测试被用来检查所获得的结果。这两个测试的结果表明,迭代二分类器3和分类回归树决策树方法的性能优于其他决策树方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Statistical analysis of various splitting criteria for decision trees
Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Algorithms & Computational Technology
Journal of Algorithms & Computational Technology COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
1.70
自引率
0.00%
发文量
8
审稿时长
15 weeks
期刊最新文献
Statistical analysis of various splitting criteria for decision trees Inexact fixed-point iteration method for nonlinear complementarity problems Pretrained back propagation based adaptive resonance theory network for adaptive learning Modelling influenza and SARS-CoV-2 interaction: Analysis for Catalonia region DataSifter II: Partially synthetic data sharing of sensitive information containing time-varying correlated observations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1