自适应模糊邻域决策树

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Soft Computing Pub Date : 2024-11-05 DOI:10.1016/j.asoc.2024.112435

Xinyu Cui , Changzhong Wang , Shuang An , Yuhua Qian

{"title":"自适应模糊邻域决策树","authors":"Xinyu Cui , Changzhong Wang , Shuang An , Yuhua Qian","doi":"10.1016/j.asoc.2024.112435","DOIUrl":null,"url":null,"abstract":"<div><div>Decision tree algorithms have gained widespread acceptance in machine learning, with the central challenge lying in devising an optimal splitting strategy for node sample subspaces. In the context of continuous data, conventional approaches typically involve fuzzifying data or adopting a dichotomous scheme akin to the CART tree. Nevertheless, fuzzifying continuous features often entails information loss, whereas the dichotomous approach can generate an excessive number of classification rules, potentially leading to overfitting. To address these limitations, this study introduces an adaptive growth decision tree framework, termed the fuzzy neighborhood decision tree (FNDT). Initially, we establish a fuzzy neighborhood decision model by leveraging the concept of fuzzy inclusion degree. Furthermore, we delve into the topological structure of misclassified samples under the proposed decision model, providing a theoretical foundation for the construction of FNDT. Subsequently, we utilize conditional information entropy to sift through original features, prioritizing those that offer the maximum information gain for decision tree nodes. By leveraging the conditional decision partitions derived from the fuzzy neighborhood decision model, we achieve an adaptive splitting method for optimal features, culminating in an adaptive growth decision tree algorithm that relies solely on the inherent structure of real-valued data. Experimental evaluations reveal that, compared with advanced decision tree algorithms, FNDT exhibits a simple tree structure, stronger generalization capabilities, and superior performance in classifying continuous data.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"167 ","pages":"Article 112435"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive fuzzy neighborhood decision tree\",\"authors\":\"Xinyu Cui , Changzhong Wang , Shuang An , Yuhua Qian\",\"doi\":\"10.1016/j.asoc.2024.112435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Decision tree algorithms have gained widespread acceptance in machine learning, with the central challenge lying in devising an optimal splitting strategy for node sample subspaces. In the context of continuous data, conventional approaches typically involve fuzzifying data or adopting a dichotomous scheme akin to the CART tree. Nevertheless, fuzzifying continuous features often entails information loss, whereas the dichotomous approach can generate an excessive number of classification rules, potentially leading to overfitting. To address these limitations, this study introduces an adaptive growth decision tree framework, termed the fuzzy neighborhood decision tree (FNDT). Initially, we establish a fuzzy neighborhood decision model by leveraging the concept of fuzzy inclusion degree. Furthermore, we delve into the topological structure of misclassified samples under the proposed decision model, providing a theoretical foundation for the construction of FNDT. Subsequently, we utilize conditional information entropy to sift through original features, prioritizing those that offer the maximum information gain for decision tree nodes. By leveraging the conditional decision partitions derived from the fuzzy neighborhood decision model, we achieve an adaptive splitting method for optimal features, culminating in an adaptive growth decision tree algorithm that relies solely on the inherent structure of real-valued data. Experimental evaluations reveal that, compared with advanced decision tree algorithms, FNDT exhibits a simple tree structure, stronger generalization capabilities, and superior performance in classifying continuous data.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"167 \",\"pages\":\"Article 112435\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494624012092\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624012092","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

决策树算法已在机器学习领域获得广泛认可，其核心挑战在于为节点样本子空间设计最佳的分割策略。对于连续数据，传统方法通常是对数据进行模糊化处理，或采用类似于 CART 树的二分法。然而，模糊化连续特征往往会造成信息丢失，而二分法则会产生过多的分类规则，从而可能导致过度拟合。为了解决这些局限性，本研究引入了一种自适应增长决策树框架，即模糊邻域决策树（FNDT）。首先，我们利用模糊包含度的概念建立了一个模糊邻域决策模型。此外，我们还深入研究了所提出的决策模型下错误分类样本的拓扑结构，为 FNDT 的构建提供了理论基础。随后，我们利用条件信息熵筛选原始特征，优先选择那些能为决策树节点提供最大信息增益的特征。通过利用从模糊邻域决策模型中得出的条件决策分区，我们实现了最优特征的自适应分割方法，最终形成了完全依赖于实值数据固有结构的自适应增长决策树算法。实验评估表明，与先进的决策树算法相比，FNDT 具有简单的树形结构、更强的泛化能力以及在连续数据分类方面的卓越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adaptive fuzzy neighborhood decision tree

Decision tree algorithms have gained widespread acceptance in machine learning, with the central challenge lying in devising an optimal splitting strategy for node sample subspaces. In the context of continuous data, conventional approaches typically involve fuzzifying data or adopting a dichotomous scheme akin to the CART tree. Nevertheless, fuzzifying continuous features often entails information loss, whereas the dichotomous approach can generate an excessive number of classification rules, potentially leading to overfitting. To address these limitations, this study introduces an adaptive growth decision tree framework, termed the fuzzy neighborhood decision tree (FNDT). Initially, we establish a fuzzy neighborhood decision model by leveraging the concept of fuzzy inclusion degree. Furthermore, we delve into the topological structure of misclassified samples under the proposed decision model, providing a theoretical foundation for the construction of FNDT. Subsequently, we utilize conditional information entropy to sift through original features, prioritizing those that offer the maximum information gain for decision tree nodes. By leveraging the conditional decision partitions derived from the fuzzy neighborhood decision model, we achieve an adaptive splitting method for optimal features, culminating in an adaptive growth decision tree algorithm that relies solely on the inherent structure of real-valued data. Experimental evaluations reveal that, compared with advanced decision tree algorithms, FNDT exhibits a simple tree structure, stronger generalization capabilities, and superior performance in classifying continuous data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.