Longitudinal data are often subject to irregular visiting times, with outcomes and visit times influenced by a latent variable. Semi-parametric joint models that account for this dependence have been proposed; among these, the Sun model is the most suitable for count data as it employs a multiplicative link function. Semi-parametric joint models define an intercept function as the mean outcome when all covariates are set to zero; this is differenced out in the course of estimation and is consequently not estimated. The Sun estimator thus provides estimates of relative covariate effects, but is unable to provide estimates of absolute effects or of longitudinal prognosis in the absence of covariates. We extend the Sun model by additionally estimating the intercept term, showing that our extended estimator is consistent and asymptotically Normal. In simulations, our estimator outperforms the original Sun estimator in terms of bias and standard error and is also more computationally efficient. We apply our estimator to a longitudinal study of tumor recurrence among bladder cancer patients. Provided the intercept term can be adequately captured using splines, we recommend that our extended Sun estimator be used in place of the original estimator, since it leads to smaller bias, smaller standard errors, and allows estimation of the mean outcome trajectories.
{"title":"Intercept Estimation of Semi-Parametric Joint Models in the Context of Longitudinal Data Subject to Irregular Observations","authors":"Luis Ledesma, Eleanor Pullenayegum","doi":"10.1002/bimj.70088","DOIUrl":"10.1002/bimj.70088","url":null,"abstract":"<p>Longitudinal data are often subject to irregular visiting times, with outcomes and visit times influenced by a latent variable. Semi-parametric joint models that account for this dependence have been proposed; among these, the Sun model is the most suitable for count data as it employs a multiplicative link function. Semi-parametric joint models define an intercept function as the mean outcome when all covariates are set to zero; this is differenced out in the course of estimation and is consequently not estimated. The Sun estimator thus provides estimates of relative covariate effects, but is unable to provide estimates of absolute effects or of longitudinal prognosis in the absence of covariates. We extend the Sun model by additionally estimating the intercept term, showing that our extended estimator is consistent and asymptotically Normal. In simulations, our estimator outperforms the original Sun estimator in terms of bias and standard error and is also more computationally efficient. We apply our estimator to a longitudinal study of tumor recurrence among bladder cancer patients. Provided the intercept term can be adequately captured using splines, we recommend that our extended Sun estimator be used in place of the original estimator, since it leads to smaller bias, smaller standard errors, and allows estimation of the mean outcome trajectories.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In longitudinal observational studies, marginal structural models (MSMs) are used to analyze the causal effect of an exposure on the (time-to-event) outcome of interest, while accounting for exposure-affected time-dependent confounding. In the applied literature, inverse probability of treatment weighting (IPTW) has been widely adopted to estimate MSMs. An essential assumption for IPTW-based MSMs is positivity, which requires that, for any combination of measured confounders among individuals, there is a nonzero probability of receiving each treatment strategy. Positivity is crucial for valid causal inference through IPTW-based MSMs, but is often overlooked compared to confounding bias. Near-positivity violations, where certain treatments are theoretically possible but rarely observed due to randomness, are common in practical applications, particularly when the sample size is small, and they pose significant challenges for causal inference. This study investigates the impact of near-positivity violations on estimates from IPTW-based MSMs in survival analysis. Two algorithms are proposed for simulating longitudinal data from hazard-MSMs, accommodating near-positivity violations, a time-varying binary exposure, and a time-to-event outcome. Cases of near-positivity violations, where remaining unexposed is rare within certain confounder levels, are analyzed across various scenarios and weight truncation (WT) strategies. Through comprehensive simulations, this study shows that even minor near-positivity violations in longitudinal survival analyses can substantially destabilize IPTW-based estimators, inflating variance and bias, especially under aggressive WT. This work aims to serve as a critical warning against overlooking the positivity assumption or naively applying WT in causal studies using longitudinal observational data and IPTW.
在纵向观察研究中,边际结构模型(MSMs)用于分析暴露对感兴趣的(时间到事件)结果的因果效应,同时考虑暴露影响的时间相关混淆。在应用文献中,处理加权逆概率法(inverse probability of treatment weighting, IPTW)被广泛应用于msm的估计。基于iptw的msm的一个基本假设是阳性的,这要求,对于个体之间测量的混杂因素的任何组合,接受每种治疗策略的概率都是非零的。通过基于iptw的msm进行有效的因果推断,积极性是至关重要的,但与混淆偏差相比,积极性经常被忽视。在实际应用中,特别是在样本量较小的情况下,某些处理方法在理论上是可能的,但由于随机性,很少观察到接近阳性的违规,这在实际应用中很常见,并且它们对因果推理构成了重大挑战。本研究调查了在生存分析中基于iptw的msm的近阳性违规对估计的影响。提出了两种算法来模拟来自危险msm的纵向数据,包括近正违规,时变二进制暴露和时间到事件的结果。接近阳性违规的情况下,在某些混杂水平下,未暴露的情况很少,在各种场景和权重截断(WT)策略下进行分析。通过综合模拟,本研究表明,在纵向生存分析中,即使是轻微的近正性违反,也会极大地破坏基于IPTW的估计器的稳定性,使方差和偏差膨胀,特别是在积极的小波变换下。这项工作的目的是对忽视正性假设或在使用纵向观测数据和IPTW的因果研究中天真地应用小波变换提出重要警告。
{"title":"Impact of Near-Positivity Violations on IPTW-Estimated Marginal Structural Survival Models With Time-Dependent Confounding","authors":"Marta Spreafico","doi":"10.1002/bimj.70093","DOIUrl":"10.1002/bimj.70093","url":null,"abstract":"<p>In longitudinal observational studies, marginal structural models (MSMs) are used to analyze the causal effect of an exposure on the (time-to-event) outcome of interest, while accounting for exposure-affected time-dependent confounding. In the applied literature, inverse probability of treatment weighting (IPTW) has been widely adopted to estimate MSMs. An essential assumption for IPTW-based MSMs is <i>positivity</i>, which requires that, for any combination of measured confounders among individuals, there is a nonzero probability of receiving each treatment strategy. Positivity is crucial for valid causal inference through IPTW-based MSMs, but is often overlooked compared to confounding bias. Near-positivity violations, where certain treatments are theoretically possible but rarely observed due to randomness, are common in practical applications, particularly when the sample size is small, and they pose significant challenges for causal inference. This study investigates the impact of near-positivity violations on estimates from IPTW-based MSMs in survival analysis. Two algorithms are proposed for simulating longitudinal data from hazard-MSMs, accommodating near-positivity violations, a time-varying binary exposure, and a time-to-event outcome. Cases of near-positivity violations, where remaining unexposed is rare within certain confounder levels, are analyzed across various scenarios and weight truncation (WT) strategies. Through comprehensive simulations, this study shows that even minor near-positivity violations in longitudinal survival analyses can substantially destabilize IPTW-based estimators, inflating variance and bias, especially under aggressive WT. This work aims to serve as a critical warning against overlooking the positivity assumption or naively applying WT in causal studies using longitudinal observational data and IPTW.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12581517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145433194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright
With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
{"title":"Interpretable Machine Learning for Survival Analysis","authors":"Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright","doi":"10.1002/bimj.70089","DOIUrl":"https://doi.org/10.1002/bimj.70089","url":null,"abstract":"<p>With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70089","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Canonical correlation analysis (CCA) is a widely used multivariate method in omics research for integrating high-dimensional datasets. CCA identifies hidden links by deriving linear projections of observed features that maximally correlate datasets. An important requirement of standard CCA is that observations are independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are suboptimal for high-dimensional analysis and exploiting this data's longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent variable level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement an