IF 6 2区 管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE European Journal of Operational Research Pub Date : 2025-03-04 DOI:10.1016/j.ejor.2025.02.031
Xuecheng Tian, Shuaian Wang, Lu Zhen, Zuo-Jun (Max) Shen
{"title":"[formula omitted]-Tree: Crossing sharp boundaries in regression trees to find neighbors","authors":"Xuecheng Tian, Shuaian Wang, Lu Zhen, Zuo-Jun (Max) Shen","doi":"10.1016/j.ejor.2025.02.031","DOIUrl":null,"url":null,"abstract":"Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methodology: (1) the Post-hoc <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree (P<mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree), which integrates the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methodology into constructed decision trees, (2) the Advanced <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree, which seamlessly incorporates the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methodology during the tree construction process, and (3) the P<mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-random forest, which integrates the P<mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the <mml:math altimg=\"si545.svg\" display=\"inline\"><mml:mi>k</mml:mi></mml:math>-Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"110 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1016/j.ejor.2025.02.031","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

传统的分类和回归树(CART)采用自上而下、贪婪的方法,将特征空间分割成定义清晰、轴对齐的子区域(叶)。在预测过程中,每个子区域都会对其中的所有样本进行统一处理,从而得到一个恒定的预测结果。虽然这种方法因其可解释性和高效性而闻名,但它忽略了叶内和叶间复杂的局部分布。随着特征数量的增加,这种局限性变得更加明显,往往导致样本集中在叶片边界附近。这种聚类现象表明,识别相邻叶片中的近邻是有潜力的,而这一现象在文献中尚未得到探讨。我们的研究通过引入 k-Tree 方法解决了这一空白,这是一种新颖的方法,它将搜索近邻的范围从单片叶子扩展到了相邻叶子。这种方法有两个关键的创新点:(1) 在整个树空间的树叶之间建立邻接关系;(2) 通过优化视角设计新颖的树叶内和树叶间距离度量,这些度量适合树内的局部数据分布。我们探索了 k 树方法的三种实现方式:(1)将 k 树方法集成到构建的决策树中的事后 k 树(Pk-Tree);(2)在树构建过程中无缝集成 k 树方法的高级 k 树;以及(3)将 Pk 树原理与随机森林框架集成的 Pk 随机森林。在各种现实世界和合成数据集上进行的实证评估结果表明,与传统模型相比,k 树方法具有更高的预测准确性。这些结果凸显了 k-Tree 方法通过深入洞察树空间内样本之间的关系来增强预测分析能力的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
[formula omitted]-Tree: Crossing sharp boundaries in regression trees to find neighbors
Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the k-Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the k-Tree methodology: (1) the Post-hoc k-Tree (Pk-Tree), which integrates the k-Tree methodology into constructed decision trees, (2) the Advanced k-Tree, which seamlessly incorporates the k-Tree methodology during the tree construction process, and (3) the Pk-random forest, which integrates the Pk-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the k-Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the k-Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
European Journal of Operational Research
European Journal of Operational Research 管理科学-运筹学与管理科学
CiteScore
11.90
自引率
9.40%
发文量
786
审稿时长
8.2 months
期刊介绍: The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.
期刊最新文献
A nonparametric online control chart for monitoring crowd density using relative density-ratio estimation Cyclic stochastic two-echelon inventory routing for an application in medical supply A logic-based Benders decomposition approach for a fuel delivery problem with time windows, unsplit compartments, and split deliveries Editorial Board Heteroscedasticity-aware stratified sampling to improve uplift modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1