Fast Interpretable Greedy-Tree Sums.

IF 9.5 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Proceedings of the National Academy of Sciences of the United States of America Pub Date : 2025-02-18 Epub Date: 2025-02-14 DOI:10.1073/pnas.2310151122
Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu
{"title":"Fast Interpretable Greedy-Tree Sums.","authors":"Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu","doi":"10.1073/pnas.2310151122","DOIUrl":null,"url":null,"abstract":"<p><p>Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"122 7","pages":"e2310151122"},"PeriodicalIF":9.5000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11848335/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2310151122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
快速可解释的贪婪树和。
现代机器学习已经取得了令人印象深刻的预测性能,但往往牺牲了可解释性,这是医学等高风险领域的关键考虑因素。在这种情况下,从业者经常使用高度可解释的决策树模型,但是这些模型会受到对加法结构的归纳偏见的影响。为了克服这种偏见,我们提出了快速可解释贪婪树和(FIGS),它推广了分类和回归树(CART)算法,以同时在求和中生长灵活数量的树。通过将逻辑规则与加法相结合,FIGS在保持高度可解释性的同时适应了加法结构。在真实世界数据集上的实验表明,FIGS达到了最先进的预测性能。为了证明FIGS在高风险领域的有用性,我们采用FIGS来学习临床决策工具(cdi),这是指导决策的工具。具体来说,我们引入了一种称为组概率加权树和(G-FIGS)的FIGS变体,该变体解释了医疗数据的异质性。G-FIGS衍生的cdi反映了领域知识,并在不牺牲敏感性或可解释性的情况下提高了特异性(比CART提高了20%)。从理论上讲,我们证明了FIGS学习了加性模型的组成部分,我们称之为解纠缠。此外,我们表明(在oracle条件下),当拟合到可加性回归函数时,树和模型利用解纠缠比单树模型更有效地进行泛化。最后,为了避免过度拟合无约束的分裂数,我们开发了Bagging-FIGS,这是FIGS的集成版本,借鉴了随机森林的方差缩减技术。Bagging-FIGS在实际数据集上的表现与随机森林和XGBoost具有竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
期刊最新文献
Reply to Borst et al.: Acupuncture and the placebo effect. High-grade endometrial stromal sarcoma heterogeneity: Fusion-driven, fusion-negative, and related undifferentiated uterine sarcomas. Allopatric coevolution: Migratory predators may facilitate mimicry between geographically nonoverlapping species. Reply to Hojný et al.: Fusion-driven vs. fusion-negative high-grade endometrial stromal sarcoma. Mycoelectronics: Bioprinted living fungal bioelectronics for artificial sensation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1