The influence of dimensions on the complexity of computing decision trees

IF 4.6 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2025-06-01 Epub Date: 2025-03-07 DOI:10.1016/j.artint.2025.104322
Stephen Kobourov , Maarten Löffler , Fabrizio Montecchiani , Marcin Pilipczuk , Ignaz Rutter , Raimund Seidel , Manuel Sorge , Jules Wulms
{"title":"The influence of dimensions on the complexity of computing decision trees","authors":"Stephen Kobourov ,&nbsp;Maarten Löffler ,&nbsp;Fabrizio Montecchiani ,&nbsp;Marcin Pilipczuk ,&nbsp;Ignaz Rutter ,&nbsp;Raimund Seidel ,&nbsp;Manuel Sorge ,&nbsp;Jules Wulms","doi":"10.1016/j.artint.2025.104322","DOIUrl":null,"url":null,"abstract":"<div><div>A decision tree recursively splits a feature space <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span> and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number <em>d</em> of dimensions of the feature space <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span>, which contains <em>n</em> training examples. We show that it can be solved in <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn><mi>d</mi><mo>+</mo><mn>1</mn></mrow></msup><mo>)</mo></math></span> time, but under reasonable complexity-theoretic assumptions it is not possible to achieve <span><math><mi>f</mi><mo>(</mo><mi>d</mi><mo>)</mo><mo>⋅</mo><msup><mrow><mi>n</mi></mrow><mrow><mi>o</mi><mo>(</mo><mi>d</mi><mo>/</mo><mi>log</mi><mo>⁡</mo><mi>d</mi><mo>)</mo></mrow></msup></math></span> running time. The problem is solvable in <span><math><msup><mrow><mo>(</mo><mi>d</mi><mi>R</mi><mo>)</mo></mrow><mrow><mi>O</mi><mo>(</mo><mi>d</mi><mi>R</mi><mo>)</mo></mrow></msup><mo>⋅</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>1</mn><mo>+</mo><mi>o</mi><mo>(</mo><mn>1</mn><mo>)</mo></mrow></msup></math></span> time if there are exactly two classes and <em>R</em> is an upper bound on the number of tree leaves labeled with the first class.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"343 ","pages":"Article 104322"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225000414","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

A decision tree recursively splits a feature space Rd and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space Rd, which contains n training examples. We show that it can be solved in O(n2d+1) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d)no(d/logd) running time. The problem is solvable in (dR)O(dR)n1+o(1) time if there are exactly two classes and R is an upper bound on the number of tree leaves labeled with the first class.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
维数对决策树计算复杂度的影响
决策树递归地分割特征空间Rd,然后根据分割结果分配类标签。几十年来,决策树一直是基本机器学习工具包的一部分。大量的工作考虑了从训练数据计算决策树的启发式算法,通常以最小化结果树的大小为目标。相比之下,对于为给定训练数据计算最小大小树的潜在计算问题的复杂性知之甚少。我们根据特征空间Rd的维数d来研究这个问题,其中包含n个训练样本。我们证明了它可以在O(n2d+1)时间内求解,但在合理的复杂性理论假设下,不可能实现f(d)⋅no(d/log (d))的运行时间。如果恰好有两类且R是标记为第一类的树叶数目的上界,则问题在(dR)O(dR)⋅n1+ O(1)时间内可解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
期刊最新文献
Learning semi-parametric tree models from mixed data Mathematical runtime analysis of a multi-Valued estimation of distribution algorithm Global and local context in short text neural topic model Proportional justified representation A unified theoretical framework for the last-iterate convergence of stochastic adaptive optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1