Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright
With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
{"title":"Interpretable Machine Learning for Survival Analysis","authors":"Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright","doi":"10.1002/bimj.70089","DOIUrl":"https://doi.org/10.1002/bimj.70089","url":null,"abstract":"<p>With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70089","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Canonical correlation analysis (CCA) is a widely used multivariate method in omics research for integrating high-dimensional datasets. CCA identifies hidden links by deriving linear projections of observed features that maximally correlate datasets. An important requirement of standard CCA is that observations are independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are suboptimal for high-dimensional analysis and exploiting this data's longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent variable level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement an