Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction

IF 3.7 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-11-10 DOI:10.1016/j.ijmedinf.2024.105700

Diana Shamsutdinova , Daniel Stamate , Daniel Stahl

{"title":"Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction","authors":"Diana Shamsutdinova , Daniel Stamate , Daniel Stahl","doi":"10.1016/j.ijmedinf.2024.105700","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.</div></div><div><h3>Methods</h3><div>We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.</div><div>To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.</div></div><div><h3>Results</h3><div>In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.</div></div><div><h3>Conclusion</h3><div>Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"194 ","pages":"Article 105700"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003630","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.

Methods

We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.

To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.

Results

In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.

Conclusion

Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

平衡准确性与可解释性：评估 Cox 模型之外的复杂关系并应用于临床预测的 R 软件包。

背景：准确且可解释的模型对于临床决策至关重要，因为预测会直接影响到患者护理。机器学习（ML）生存方法可以处理复杂的多维数据并获得高准确性，但需要事后解释。考克斯比例危害模型（Cox-PH）等传统模型灵活性较差，但速度快、稳定性好，而且本质上是透明的。此外，在临床环境中，ML 并不总是优于 Cox-PH，因此需要对模型进行认真的验证。我们的目标是开发一套 R 函数，利用集合学习和嵌套交叉验证，帮助探索 Cox-PH 与基于树和深度学习的生存模型相比在临床预测建模方面的局限性：我们开发了一套 R 函数，作为 "survcompare "软件包公开发布。它支持 Cox-PH 和 Cox-Lasso，生存随机森林（SRF）和 DeepHit 是 ML 的替代方法，以及将 Cox-PH 与 SRF 或 DeepHit 整合在一起的集合方法，旨在分离 ML 的边际价值。该软件包执行重复嵌套交叉验证，并使用生存特定性能指标、一致性指数、随时间变化的 AUC-ROC 和校准斜率检验 ML 优越性的统计显著性。为了获得实用的见解，我们将这种方法应用于具有不同复杂性和规模的临床和模拟数据集：结果：在具有非线性或交互作用的模拟数据中，当样本量≥ 500 时，ML 模型优于 Cox-PH。在成像和高维临床数据中也观察到了 ML 的优越性。然而，在表格临床数据中，ML 的性能提升微乎其微；在某些情况下，正则化 Cox-Lasso 恢复了 ML 的大部分性能优势，而且计算速度明显更快。结合 Cox-PH 和 ML 预测的集合方法有助于量化 Cox-PH 的局限性并改进 ML 校准。在利用表格数据或规模有限的数据开发临床预测模型时，不应忽视 Cox-PH 或 Cox-Lasso 等传统模型：我们的软件包为研究人员提供了评估准确性-可解释性权衡的框架和实用工具，有助于在模型选择方面做出明智的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.