Diana Shamsutdinova , Daniel Stamate , Daniel Stahl
{"title":"Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction","authors":"Diana Shamsutdinova , Daniel Stamate , Daniel Stahl","doi":"10.1016/j.ijmedinf.2024.105700","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.</div></div><div><h3>Methods</h3><div>We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.</div><div>To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.</div></div><div><h3>Results</h3><div>In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.</div></div><div><h3>Conclusion</h3><div>Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"194 ","pages":"Article 105700"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003630","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.
Methods
We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope.
To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.
Results
In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.
Conclusion
Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.