Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-11-10 DOI:10.1016/j.ijmedinf.2024.105700
Diana Shamsutdinova , Daniel Stamate , Daniel Stahl
{"title":"Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction","authors":"Diana Shamsutdinova ,&nbsp;Daniel Stamate ,&nbsp;Daniel Stahl","doi":"10.1016/j.ijmedinf.2024.105700","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.</div></div><div><h3>Methods</h3><div>We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.</div><div>To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.</div></div><div><h3>Results</h3><div>In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.</div></div><div><h3>Conclusion</h3><div>Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"194 ","pages":"Article 105700"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003630","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.

Methods

We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.
To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.

Results

In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.

Conclusion

Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
平衡准确性与可解释性:评估 Cox 模型之外的复杂关系并应用于临床预测的 R 软件包。
背景:准确且可解释的模型对于临床决策至关重要,因为预测会直接影响到患者护理。机器学习(ML)生存方法可以处理复杂的多维数据并获得高准确性,但需要事后解释。考克斯比例危害模型(Cox-PH)等传统模型灵活性较差,但速度快、稳定性好,而且本质上是透明的。此外,在临床环境中,ML 并不总是优于 Cox-PH,因此需要对模型进行认真的验证。我们的目标是开发一套 R 函数,利用集合学习和嵌套交叉验证,帮助探索 Cox-PH 与基于树和深度学习的生存模型相比在临床预测建模方面的局限性:我们开发了一套 R 函数,作为 "survcompare "软件包公开发布。它支持 Cox-PH 和 Cox-Lasso,生存随机森林(SRF)和 DeepHit 是 ML 的替代方法,以及将 Cox-PH 与 SRF 或 DeepHit 整合在一起的集合方法,旨在分离 ML 的边际价值。该软件包执行重复嵌套交叉验证,并使用生存特定性能指标、一致性指数、随时间变化的 AUC-ROC 和校准斜率检验 ML 优越性的统计显著性。为了获得实用的见解,我们将这种方法应用于具有不同复杂性和规模的临床和模拟数据集:结果:在具有非线性或交互作用的模拟数据中,当样本量≥ 500 时,ML 模型优于 Cox-PH。在成像和高维临床数据中也观察到了 ML 的优越性。然而,在表格临床数据中,ML 的性能提升微乎其微;在某些情况下,正则化 Cox-Lasso 恢复了 ML 的大部分性能优势,而且计算速度明显更快。结合 Cox-PH 和 ML 预测的集合方法有助于量化 Cox-PH 的局限性并改进 ML 校准。在利用表格数据或规模有限的数据开发临床预测模型时,不应忽视 Cox-PH 或 Cox-Lasso 等传统模型:我们的软件包为研究人员提供了评估准确性-可解释性权衡的框架和实用工具,有助于在模型选择方面做出明智的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Editorial Board Analysis of missing data in electronic health records of people with diabetes in primary care in Spain: A population-based cohort study What information do patients pay more attention to in online physician selection? Information needs model for online medical choice decision-making based on trust theory and fuzzy decision Systematic construction of composite radiation therapy dataset using automated data pipeline for prognosis prediction Perceptions of healthcare professionals and patients with cardiovascular diseases on mHealth lifestyle apps: A qualitative study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1