Fast Interpretable Greedy-Tree Sums.

IF 9.5 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Proceedings of the National Academy of Sciences of the United States of America Pub Date : 2025-02-18 Epub Date: 2025-02-14 DOI:10.1073/pnas.2310151122

Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu

{"title":"Fast Interpretable Greedy-Tree Sums.","authors":"Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu","doi":"10.1073/pnas.2310151122","DOIUrl":null,"url":null,"abstract":"<p><p>Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"122 7","pages":"e2310151122"},"PeriodicalIF":9.5000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11848335/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2310151122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

快速可解释的贪婪树和。

现代机器学习已经取得了令人印象深刻的预测性能，但往往牺牲了可解释性，这是医学等高风险领域的关键考虑因素。在这种情况下，从业者经常使用高度可解释的决策树模型，但是这些模型会受到对加法结构的归纳偏见的影响。为了克服这种偏见，我们提出了快速可解释贪婪树和（FIGS），它推广了分类和回归树（CART）算法，以同时在求和中生长灵活数量的树。通过将逻辑规则与加法相结合，FIGS在保持高度可解释性的同时适应了加法结构。在真实世界数据集上的实验表明，FIGS达到了最先进的预测性能。为了证明FIGS在高风险领域的有用性，我们采用FIGS来学习临床决策工具（cdi），这是指导决策的工具。具体来说，我们引入了一种称为组概率加权树和（G-FIGS）的FIGS变体，该变体解释了医疗数据的异质性。G-FIGS衍生的cdi反映了领域知识，并在不牺牲敏感性或可解释性的情况下提高了特异性（比CART提高了20%）。从理论上讲，我们证明了FIGS学习了加性模型的组成部分，我们称之为解纠缠。此外，我们表明（在oracle条件下），当拟合到可加性回归函数时，树和模型利用解纠缠比单树模型更有效地进行泛化。最后，为了避免过度拟合无约束的分裂数，我们开发了Bagging-FIGS，这是FIGS的集成版本，借鉴了随机森林的方差缩减技术。Bagging-FIGS在实际数据集上的表现与随机森林和XGBoost具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the National Academy of Sciences of the United States of America 综合性期刊-综合性期刊

CiteScore

19.00

自引率

0.90%

发文量

3575

审稿时长

2.5 months

期刊介绍： The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.