首页 > 最新文献

Biometrical Journal最新文献

英文 中文
Interpretable Machine Learning for Survival Analysis 用于生存分析的可解释机器学习
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-30 DOI: 10.1002/bimj.70089
Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright

With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.

随着黑箱机器学习(ML)模型的普及和快速发展,可解释机器学习(IML)或可解释人工智能(XAI)领域在过去十年中变得越来越重要。这对于生存分析尤其重要,因为采用IML技术可促进敏感领域的透明度、问责制和公平性,如临床决策过程、靶向治疗的开发、干预措施或其他医疗或卫生保健相关环境。更具体地说,可解释性可以揭示生存模型的潜在偏差和局限性,并提供更合理的数学方法来理解哪些特征如何以及哪些特征对预测有影响或构成风险因素。然而,缺乏现成可用的IML方法可能会阻碍从业者利用ML的全部潜力来预测事件时间数据。我们提出了一个全面的审查,现有的工作对IML方法的生存分析在一般IML分类的背景下。此外,我们正式详细介绍了常用的IML方法,如个体条件期望(ICE)、部分依赖图(PDP)、累积局部效应(ALE)、不同特征重要性度量或弗里德曼h -相互作用统计,如何适用于生存结果。德国乳腺癌研究小组(GBSG2)将几种IML方法应用于乳腺癌复发数据,为研究人员提供了如何在实践中利用这些技术来促进对模型决策或预测的理解的教程或指南。
{"title":"Interpretable Machine Learning for Survival Analysis","authors":"Sophie Hanna Langbein,&nbsp;Mateusz Krzyziński,&nbsp;Mikołaj Spytek,&nbsp;Hubert Baniecki,&nbsp;Przemysław Biecek,&nbsp;Marvin N. Wright","doi":"10.1002/bimj.70089","DOIUrl":"https://doi.org/10.1002/bimj.70089","url":null,"abstract":"<p>With the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision-making processes, the development of targeted therapies, interventions, or in other medical or healthcare-related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time-to-event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70089","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Canonical Correlation Analysis for Multiple Measurements With Latent Trajectories 具有潜在轨迹的多重测量的稀疏典型相关分析。
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-30 DOI: 10.1002/bimj.70090
Nuria Senar, Aeilko H. Zwinderman, Michel H. Hof

Canonical correlation analysis (CCA) is a widely used multivariate method in omics research for integrating high-dimensional datasets. CCA identifies hidden links by deriving linear projections of observed features that maximally correlate datasets. An important requirement of standard CCA is that observations are independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are suboptimal for high-dimensional analysis and exploiting this data's longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent variable level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement an 0$ell _0$ penalty to enforce fixed sparsity levels. We estimate these trajectories fitting longitudinal models to the low-dimensional latent variables, leveraging the clustered structure of high-dimensional datasets, thus exploring shared longitudinal latent mechanisms. Furthermore, modeling time in the latent space significantly reduces computational burden. We validate our model's performance using simulated data and show its real-world applicability with data from the Human Microbiome Project. This application highlights the model's ability to handle high-dimensional, sparsely, and irregularly observed data. Our CCA method for repeated measurements enables efficient estimation of canonical correlations across measurements for clustered data. Compared to existing methods, ours substantially reduces computational time in high-dimensional analyses as well as provides longitudinal trajectories that yield interpretable and insightful results.

典型相关分析(Canonical correlation analysis, CCA)是一种广泛应用于组学研究的多变量方法,用于整合高维数据集。CCA通过获得最大关联数据集的观测特征的线性投影来识别隐藏链接。标准CCA的一个重要要求是观测值彼此独立。因此,它不能很好地处理重复测量。当前处理这些挑战的CCA扩展要么对汇总数据执行CCA,要么估计每个度量的相关性。虽然这些技术考虑了测量之间的相关性,但对于高维分析和利用这些数据的纵向质量来说,它们不是最佳的。我们提出了一种新的扩展稀疏CCA,通过纵向模型在潜在变量水平上结合时间动力学。该方法在绘制潜在路径的同时解决了重复测量的相关性,重点关注相关结构中的动态。为了提高可解释性和计算效率,我们实现了一个l0 $ well _0$惩罚来强制执行固定的稀疏度级别。我们估计这些轨迹拟合纵向模型到低维潜在变量,利用高维数据集的聚类结构,从而探索共享的纵向潜在机制。此外,潜在空间的建模时间大大减少了计算量。我们使用模拟数据验证了模型的性能,并使用人类微生物组项目的数据展示了其在现实世界中的适用性。这个应用程序突出了模型处理高维、稀疏和不规则观测数据的能力。我们用于重复测量的CCA方法能够有效地估计聚类数据测量之间的典型相关性。与现有的方法相比,我们的方法大大减少了高维分析的计算时间,并提供了纵向轨迹,产生了可解释的和深刻的结果。
{"title":"Sparse Canonical Correlation Analysis for Multiple Measurements With Latent Trajectories","authors":"Nuria Senar,&nbsp;Aeilko H. Zwinderman,&nbsp;Michel H. Hof","doi":"10.1002/bimj.70090","DOIUrl":"10.1002/bimj.70090","url":null,"abstract":"<p>Canonical correlation analysis (CCA) is a widely used multivariate method in omics research for integrating high-dimensional datasets. CCA identifies hidden links by deriving linear projections of observed features that maximally correlate datasets. An important requirement of standard CCA is that observations are independent of each other. As a result, it cannot properly deal with repeated measurements. Current CCA extensions dealing with these challenges either perform CCA on summarized data or estimate correlations for each measurement. While these techniques factor in the correlation between measurements, they are suboptimal for high-dimensional analysis and exploiting this data's longitudinal qualities. We propose a novel extension of sparse CCA that incorporates time dynamics at the latent variable level through longitudinal models. This approach addresses the correlation of repeated measurements while drawing latent paths, focusing on dynamics in the correlation structures. To aid interpretability and computational efficiency, we implement an <span></span><math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>ℓ</mi>\u0000 <mn>0</mn>\u0000 </msub>\u0000 <annotation>$ell _0$</annotation>\u0000 </semantics></math> penalty to enforce fixed sparsity levels. We estimate these trajectories fitting longitudinal models to the low-dimensional latent variables, leveraging the clustered structure of high-dimensional datasets, thus exploring shared longitudinal latent mechanisms. Furthermore, modeling time in the latent space significantly reduces computational burden. We validate our model's performance using simulated data and show its real-world applicability with data from the Human Microbiome Project. This application highlights the model's ability to handle high-dimensional, sparsely, and irregularly observed data. Our CCA method for repeated measurements enables efficient estimation of canonical correlations across measurements for clustered data. Compared to existing methods, ours substantially reduces computational time in high-dimensional analyses as well as provides longitudinal trajectories that yield interpretable and insightful results.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70090","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-Observation Approach for Length-Biased Cox Proportional Hazards Model 长度偏置Cox比例风险模型的伪观测方法
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-30 DOI: 10.1002/bimj.70094
Mahboubeh Akbari, Najmeh Nakhaei Rad, Ding-Geng Chen

Pseudo-observations are used to estimate the expectation of a function of interest in a population when survival data are incomplete due to censoring or truncation. Length-biased sampling is a special case of a left-truncation model, in which the truncation variable follows a uniform distribution. This phenomenon is commonly encountered in various fields such as survival analysis and epidemiology, where the event of interest is related to the length or duration of an underlying process. In such settings, the probability of observing a data point is higher for longer lengths, leading to biased sampling. The goal of this paper is to apply pseudo-observations to estimate the regression coefficients in the Cox proportional hazards model under length-biased right-censored (LBRC) data. We assess the accuracy and efficiency of two approaches that differ in their generation of pseudo-observations, comparing them with two prominent standard methods in the presence of LBRC data. The results demonstrate that the two proposed pseudo-observation methods are comparable to the standard methods in terms of standard error, with advantages in providing confidence intervals that are closer to the nominal level in large sample sizes and specific scenarios. Additionally, although length-biased data are a special case of left-truncated data, they must be addressed separately by utilizing the information that the left-truncation variable follows a uniform distribution, as the simulation results show. We also establish the consistency and asymptotic normality of one of the proposed estimators. Finally, we applied the method to analyze a real dataset from LBRC.

伪观测是用来估计当生存数据由于审查或截断而不完整时,群体中感兴趣的函数的期望。长度偏差抽样是左截断模型的一种特殊情况,在左截断模型中,截断变量服从均匀分布。这种现象在生存分析和流行病学等各个领域都很常见,在这些领域中,感兴趣的事件与潜在过程的长度或持续时间有关。在这种情况下,观察到一个数据点的概率越长,导致有偏差的抽样。本文的目的是利用伪观测值估计长度偏右截尾(LBRC)数据下Cox比例风险模型的回归系数。我们评估了两种不同的伪观测生成方法的准确性和效率,并将它们与存在LBRC数据的两种突出的标准方法进行了比较。结果表明,两种拟观测方法在标准误差方面与标准方法相当,在大样本和特定场景下提供更接近名义水平的置信区间具有优势。此外,尽管长度偏倚数据是左截断数据的一种特殊情况,但正如仿真结果所示,它们必须通过利用左截断变量遵循均匀分布的信息来单独解决。我们还建立了其中一个估计量的相合性和渐近正态性。最后,我们将该方法应用于LBRC的真实数据集分析。
{"title":"Pseudo-Observation Approach for Length-Biased Cox Proportional Hazards Model","authors":"Mahboubeh Akbari,&nbsp;Najmeh Nakhaei Rad,&nbsp;Ding-Geng Chen","doi":"10.1002/bimj.70094","DOIUrl":"https://doi.org/10.1002/bimj.70094","url":null,"abstract":"<p>Pseudo-observations are used to estimate the expectation of a function of interest in a population when survival data are incomplete due to censoring or truncation. Length-biased sampling is a special case of a left-truncation model, in which the truncation variable follows a uniform distribution. This phenomenon is commonly encountered in various fields such as survival analysis and epidemiology, where the event of interest is related to the length or duration of an underlying process. In such settings, the probability of observing a data point is higher for longer lengths, leading to biased sampling. The goal of this paper is to apply pseudo-observations to estimate the regression coefficients in the Cox proportional hazards model under length-biased right-censored (LBRC) data. We assess the accuracy and efficiency of two approaches that differ in their generation of pseudo-observations, comparing them with two prominent standard methods in the presence of LBRC data. The results demonstrate that the two proposed pseudo-observation methods are comparable to the standard methods in terms of standard error, with advantages in providing confidence intervals that are closer to the nominal level in large sample sizes and specific scenarios. Additionally, although length-biased data are a special case of left-truncated data, they must be addressed separately by utilizing the information that the left-truncation variable follows a uniform distribution, as the simulation results show. We also establish the consistency and asymptotic normality of one of the proposed estimators. Finally, we applied the method to analyze a real dataset from LBRC.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70094","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Markov Nonparametric Estimation of Complex Multistate Outcomes After Hematopoietic Stem Cell Transplantation 造血干细胞移植后复杂多状态结果的非马尔可夫非参数估计。
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-29 DOI: 10.1002/bimj.70082
Judith Vilsmeier, Sandra Schmeller, Daniel Fürst, Jan Beyersmann

Often probabilities of nonstandard time-to-event endpoints are of interest, which are more complex than overall survival. One such probability is chronic graft-versus-host disease (GvHD-) and relapse-free survival, the probability of being alive, in remission, and not suffering from chronic GvHD after stem cell transplantation, with chronic GvHD being a recurrent event. Because the probabilities for endpoints with recurrent events may not fall monotonically, one should not use the Kaplan–Meier estimator for estimation, but the Aalen–Johansen estimator. The Aalen–Johansen is a consistent estimator even in non-Markov scenarios if state occupation probabilities are being estimated and censoring is random. In some multistate models, it is also possible to use linear combinations of Kaplan–Meier estimators, which do not depend on the Markov assumption but can estimate probabilities to be out of bounds. For these linear combinations, we propose a wild bootstrap procedure for inference and compare it with the wild bootstrap for the Aalen–Johansen estimator in non-Markov scenarios. In the proposed procedure, the limiting distribution of the Nelson–Aalen estimator is approximated using the wild bootstrap and transformed via the functional delta method. This approach is adaptable to different multistate models. Using real data, confidence bands are generated using the wild bootstrap for chronic GvHD- and relapse-free survival. Additionally, coverage probabilities of confidence intervals and confidence bands generated by Efron's bootstrap and the wild bootstrap are examined with simulations.

通常,非标准时间到事件端点的概率是令人感兴趣的,这比总体生存要复杂得多。其中一种可能性是慢性移植物抗宿主病(GvHD-)和无复发生存,即干细胞移植后存活、缓解和不患慢性移植物抗宿主病的概率,慢性移植物抗宿主病是复发事件。由于具有循环事件的端点的概率可能不会单调下降,因此不应该使用Kaplan-Meier估计量进行估计,而应该使用aallen - johansen估计量。即使在非马尔可夫情况下,如果估计国家占领概率并且审查是随机的,aallen - johansen也是一致估计器。在一些多状态模型中,也可以使用Kaplan-Meier估计器的线性组合,它不依赖于马尔可夫假设,但可以估计出超出边界的概率。对于这些线性组合,我们提出了一个野生自举推理过程,并将其与非马尔可夫场景下aallen - johansen估计的野生自举进行了比较。在所提出的程序中,Nelson-Aalen估计量的极限分布使用野自举近似,并通过泛函增量方法进行变换。这种方法适用于不同的多状态模型。使用真实数据,使用野生bootstrap生成慢性GvHD和无复发生存的置信带。此外,通过仿真检验了Efron自举法和野生自举法生成的置信区间和置信带的覆盖概率。
{"title":"Non-Markov Nonparametric Estimation of Complex Multistate Outcomes After Hematopoietic Stem Cell Transplantation","authors":"Judith Vilsmeier,&nbsp;Sandra Schmeller,&nbsp;Daniel Fürst,&nbsp;Jan Beyersmann","doi":"10.1002/bimj.70082","DOIUrl":"10.1002/bimj.70082","url":null,"abstract":"<p>Often probabilities of nonstandard time-to-event endpoints are of interest, which are more complex than overall survival. One such probability is chronic graft-versus-host disease (GvHD-) and relapse-free survival, the probability of being alive, in remission, and not suffering from chronic GvHD after stem cell transplantation, with chronic GvHD being a recurrent event. Because the probabilities for endpoints with recurrent events may not fall monotonically, one should not use the Kaplan–Meier estimator for estimation, but the Aalen–Johansen estimator. The Aalen–Johansen is a consistent estimator even in non-Markov scenarios if state occupation probabilities are being estimated and censoring is random. In some multistate models, it is also possible to use linear combinations of Kaplan–Meier estimators, which do not depend on the Markov assumption but can estimate probabilities to be out of bounds. For these linear combinations, we propose a wild bootstrap procedure for inference and compare it with the wild bootstrap for the Aalen–Johansen estimator in non-Markov scenarios. In the proposed procedure, the limiting distribution of the Nelson–Aalen estimator is approximated using the wild bootstrap and transformed via the functional delta method. This approach is adaptable to different multistate models. Using real data, confidence bands are generated using the wild bootstrap for chronic GvHD- and relapse-free survival. Additionally, coverage probabilities of confidence intervals and confidence bands generated by Efron's bootstrap and the wild bootstrap are examined with simulations.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable Selection via Fused Sparse-Group Lasso Penalized Multi-state Models Incorporating Molecular Data 结合分子数据的融合稀疏群套索惩罚多态模型的变量选择
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 DOI: 10.1002/bimj.70087
Kaya Miah, Jelle J. Goeman, Hein Putter, Annette Kopp-Schneider, Axel Benner

In multi-state models based on high-dimensional data, effective modeling strategies are required to determine an optimal, ideally parsimonious model. In particular, linking covariate effects across transitions is needed to conduct joint variable selection. A useful technique to reduce model complexity is to address homogeneous covariate effects for distinct transitions. We integrate this approach to data-driven variable selection by extended regularization methods within multi-state model building. We propose the fused sparse-group lasso (FSGL) penalized Cox-type regression in the framework of multi-state models combining the penalization concepts of pairwise differences of covariate effects along with transition-wise grouping. For optimization, we adapt the alternating direction method of multipliers (ADMM) algorithm to transition-specific hazards regression in the multi-state setting. In a simulation study and application to acute myeloid leukemia (AML) data, we evaluate the algorithm's ability to select a sparse model incorporating relevant transition-specific effects and similar cross-transition effects. We investigate settings in which the combined penalty is beneficial compared to global lasso regularization.

Clinical Trial Registration: The AMLSG 09-09 trial is registered with ClinicalTrials.gov (NCT00893399) and has been completed.

在基于高维数据的多状态模型中,需要有效的建模策略来确定最优的、理想的简约模型。特别是,需要将跨过渡的协变量效应联系起来进行联合变量选择。降低模型复杂性的一个有用技术是处理不同过渡的同质协变量效应。我们通过扩展正则化方法将这种方法集成到多状态模型构建中的数据驱动变量选择中。在多状态模型框架下,结合协变量效应两两差异的惩罚概念和过渡明智分组,提出了融合稀疏群套索惩罚cox型回归。为了优化,我们将乘法器的交替方向法(ADMM)算法应用于多状态下的过渡风险回归。在对急性髓性白血病(AML)数据的模拟研究和应用中,我们评估了该算法选择包含相关过渡特异性效应和类似交叉过渡效应的稀疏模型的能力。我们研究了与全局套索正则化相比,组合惩罚是有益的设置。临床试验注册:AMLSG 09-09试验已在ClinicalTrials.gov (NCT00893399)注册并已完成。
{"title":"Variable Selection via Fused Sparse-Group Lasso Penalized Multi-state Models Incorporating Molecular Data","authors":"Kaya Miah,&nbsp;Jelle J. Goeman,&nbsp;Hein Putter,&nbsp;Annette Kopp-Schneider,&nbsp;Axel Benner","doi":"10.1002/bimj.70087","DOIUrl":"https://doi.org/10.1002/bimj.70087","url":null,"abstract":"<p>In multi-state models based on high-dimensional data, effective modeling strategies are required to determine an optimal, ideally parsimonious model. In particular, linking covariate effects across transitions is needed to conduct joint variable selection. A useful technique to reduce model complexity is to address homogeneous covariate effects for distinct transitions. We integrate this approach to data-driven variable selection by extended regularization methods within multi-state model building. We propose the fused sparse-group lasso (FSGL) penalized Cox-type regression in the framework of multi-state models combining the penalization concepts of pairwise differences of covariate effects along with transition-wise grouping. For optimization, we adapt the alternating direction method of multipliers (ADMM) algorithm to transition-specific hazards regression in the multi-state setting. In a simulation study and application to acute myeloid leukemia (AML) data, we evaluate the algorithm's ability to select a sparse model incorporating relevant transition-specific effects and similar cross-transition effects. We investigate settings in which the combined penalty is beneficial compared to global lasso regularization.</p><p><b>Clinical Trial Registration:</b> The AMLSG 09-09 trial is registered with ClinicalTrials.gov (NCT00893399) and has been completed.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70087","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Testing Using Surrogate Information 使用代理信息进行有效测试
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 DOI: 10.1002/bimj.70086
Rebecca Knowlton, Layla Parast

In modern clinical trials, there is immense pressure to use surrogate markers in place of an expensive or long-term primary outcome to make more timely decisions about treatment effectiveness. However, using a surrogate marker to test for a treatment effect can be difficult and controversial. Existing methods tend to either rely on fully parametric methods where strict assumptions are made about the relationship between the surrogate and the outcome, or assume the surrogate marker is valid for the entire study population. In this paper, we develop a fully nonparametric method for efficient testing using surrogate information (ETSI). Our approach is specifically designed for settings where there is heterogeneity in the utility of the surrogate marker, that is, the surrogate is valid for certain patient subgroups and not others. ETSI enables treatment effect estimation and hypothesis testing via kernel-based estimation for a setting where the surrogate is used in place of the primary outcome for individuals for whom the surrogate is valid, and the primary outcome is purposefully only measured in the remaining patients. In addition, we provide a framework for future study design with power and sample size estimates based on our proposed testing procedure. Throughout, we assume a continuous surrogate and a primary outcome that may be discrete or continuous. We demonstrate the performance of our methods via a simulation study and application to two distinct HIV clinical trials.

在现代临床试验中,为了对治疗效果做出更及时的决定,使用替代标记物代替昂贵的或长期的主要结果存在巨大的压力。然而,使用替代标记物来测试治疗效果可能是困难和有争议的。现有的方法要么依赖于全参数方法,对替代指标与结果之间的关系做出严格的假设,要么假设替代指标对整个研究人群有效。在本文中,我们开发了一种利用替代信息(ETSI)进行有效测试的完全非参数方法。我们的方法是专门为替代标记物的应用存在异质性的情况而设计的,也就是说,替代标记物对某些患者亚组有效,而对其他患者无效。ETSI能够通过基于核的估计进行治疗效果估计和假设检验,在这种情况下,替代物被用于替代替代物有效的个体的主要结果,并且主要结果有目的地仅在剩余患者中进行测量。此外,我们为未来的研究设计提供了一个框架,根据我们提出的测试程序估计功率和样本量。在整个过程中,我们假设一个连续的替代结果和一个可能是离散或连续的主要结果。我们通过模拟研究和应用于两个不同的HIV临床试验来证明我们的方法的性能。
{"title":"Efficient Testing Using Surrogate Information","authors":"Rebecca Knowlton,&nbsp;Layla Parast","doi":"10.1002/bimj.70086","DOIUrl":"https://doi.org/10.1002/bimj.70086","url":null,"abstract":"<div>\u0000 \u0000 <p>In modern clinical trials, there is immense pressure to use surrogate markers in place of an expensive or long-term primary outcome to make more timely decisions about treatment effectiveness. However, using a surrogate marker to test for a treatment effect can be difficult and controversial. Existing methods tend to either rely on fully parametric methods where strict assumptions are made about the relationship between the surrogate and the outcome, or assume the surrogate marker is valid for the entire study population. In this paper, we develop a fully nonparametric method for efficient testing using surrogate information (ETSI). Our approach is specifically designed for settings where there is heterogeneity in the utility of the surrogate marker, that is, the surrogate is valid for certain patient subgroups and not others. ETSI enables treatment effect estimation and hypothesis testing via kernel-based estimation for a setting where the surrogate is used in place of the primary outcome for individuals for whom the surrogate is valid, and the primary outcome is purposefully only measured in the remaining patients. In addition, we provide a framework for future study design with power and sample size estimates based on our proposed testing procedure. Throughout, we assume a continuous surrogate and a primary outcome that may be discrete or continuous. We demonstrate the performance of our methods via a simulation study and application to two distinct HIV clinical trials.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Bayesian Inference for Causal Effects Using the Covariate Balancing Procedure 利用协变量平衡程序对因果效应进行广义贝叶斯推断
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 DOI: 10.1002/bimj.70085
Shunichiro Orihara, Tomotaka Momozaki, Tomoyuki Nakagawa

In observational studies, the propensity score plays a central role in estimating causal effects of interest. The inverse probability weighting (IPW) estimator is commonly used for this purpose. However, if the propensity score model is misspecified, the IPW estimator may produce biased estimates of causal effects. Previous studies have proposed some robust propensity score estimation procedures. However, these methods require considering parameters that dominate the uncertainty of sampling and treatment allocation. This study proposes a novel Bayesian estimating procedure that necessitates probabilistically deciding the parameter, rather than deterministically. Since the IPW estimator and propensity score estimator can be derived as solutions to certain loss functions, the general Bayesian paradigm, which does not require considering the full likelihood, can be applied. Therefore, our proposed method only requires the same level of assumptions as ordinary causal inference contexts. The proposed Bayesian method demonstrates equal or superior results compared to some previous methods in simulation experiments and is also applied to real data, namely the Whitehall dataset.

在观察性研究中,倾向得分在估计兴趣的因果效应方面起着核心作用。逆概率加权(IPW)估计器通常用于此目的。然而,如果倾向评分模型指定不当,IPW估计器可能会对因果效应产生偏差估计。以前的研究已经提出了一些稳健的倾向得分估计程序。然而,这些方法需要考虑控制采样和处理分配不确定性的参数。本研究提出了一种新的贝叶斯估计方法,它需要概率性地决定参数,而不是确定性地决定参数。由于IPW估计量和倾向分数估计量可以作为某些损失函数的解导出,因此可以应用不需要考虑完全似然的一般贝叶斯范式。因此,我们提出的方法只需要与普通因果推理上下文相同水平的假设。本文提出的贝叶斯方法在模拟实验中与之前的一些方法相比,取得了相同或更好的结果,并将其应用于真实数据,即Whitehall数据集。
{"title":"Generalized Bayesian Inference for Causal Effects Using the Covariate Balancing Procedure","authors":"Shunichiro Orihara,&nbsp;Tomotaka Momozaki,&nbsp;Tomoyuki Nakagawa","doi":"10.1002/bimj.70085","DOIUrl":"https://doi.org/10.1002/bimj.70085","url":null,"abstract":"<div>\u0000 \u0000 <p>In observational studies, the propensity score plays a central role in estimating causal effects of interest. The inverse probability weighting (IPW) estimator is commonly used for this purpose. However, if the propensity score model is misspecified, the IPW estimator may produce biased estimates of causal effects. Previous studies have proposed some robust propensity score estimation procedures. However, these methods require considering parameters that dominate the uncertainty of sampling and treatment allocation. This study proposes a novel Bayesian estimating procedure that necessitates probabilistically deciding the parameter, rather than deterministically. Since the IPW estimator and propensity score estimator can be derived as solutions to certain loss functions, the general Bayesian paradigm, which does not require considering the full likelihood, can be applied. Therefore, our proposed method only requires the same level of assumptions as ordinary causal inference contexts. The proposed Bayesian method demonstrates equal or superior results compared to some previous methods in simulation experiments and is also applied to real data, namely the Whitehall dataset.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Information: Biometrical Journal 6'25 期刊信息:biometic Journal 6'25
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 DOI: 10.1002/bimj.70095
{"title":"Issue Information: Biometrical Journal 6'25","authors":"","doi":"10.1002/bimj.70095","DOIUrl":"https://doi.org/10.1002/bimj.70095","url":null,"abstract":"","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70095","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharp Bounds for Continuous-Valued Treatment Effects with Unobserved Confounders 具有未观察混杂因素的连续值治疗效果的锐界。
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-14 DOI: 10.1002/bimj.70084
Jean-Baptiste Baitairian, Bernard Sebastien, Rana Jreich, Sandrine Katsahian, Agathe Guilloux

In causal inference, treatment effects are typically estimated under the ignorability, or unconfoundedness, assumption, which is often unrealistic in observational data. By relaxing this assumption and conducting a sensitivity analysis, we introduce novel bounds and derive confidence intervals for the Average Potential Outcome (APO)—a standard metric for evaluating continuous-valued treatment or exposure effects. We demonstrate that these bounds are sharp under a continuous sensitivity model, in the sense that they give the smallest possible interval under this model, and propose a doubly robust version of our estimators. In a comparative analysis with another method from the literature, using both simulated and real data sets, we show that our approach not only yields sharper bounds but also achieves good coverage of the true APO, with significantly reduced computation times.

在因果推理中,治疗效果通常是在可忽略性或非混淆性假设下估计的,这在观察数据中往往是不现实的。通过放宽这一假设并进行敏感性分析,我们引入了新的界限并推导了平均潜在结果(APO)的置信区间——APO是评估连续值治疗或暴露效应的标准度量。我们证明了这些边界在连续灵敏度模型下是尖锐的,在某种意义上,它们给出了该模型下最小的可能区间,并提出了我们估计的双鲁棒版本。在与文献中的另一种方法(使用模拟和真实数据集)的比较分析中,我们表明,我们的方法不仅产生更清晰的边界,而且还实现了对真实APO的良好覆盖,大大减少了计算时间。
{"title":"Sharp Bounds for Continuous-Valued Treatment Effects with Unobserved Confounders","authors":"Jean-Baptiste Baitairian,&nbsp;Bernard Sebastien,&nbsp;Rana Jreich,&nbsp;Sandrine Katsahian,&nbsp;Agathe Guilloux","doi":"10.1002/bimj.70084","DOIUrl":"10.1002/bimj.70084","url":null,"abstract":"<div>\u0000 \u0000 <p>In causal inference, treatment effects are typically estimated under the ignorability, or unconfoundedness, assumption, which is often unrealistic in observational data. By relaxing this assumption and conducting a sensitivity analysis, we introduce novel bounds and derive confidence intervals for the Average Potential Outcome (APO)—a standard metric for evaluating continuous-valued treatment or exposure effects. We demonstrate that these bounds are sharp under a continuous sensitivity model, in the sense that they give the smallest possible interval under this model, and propose a doubly robust version of our estimators. In a comparative analysis with another method from the literature, using both simulated and real data sets, we show that our approach not only yields sharper bounds but also achieves good coverage of the true APO, with significantly reduced computation times.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 5","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate Bayesian Dynamic Borrowing for Repeated Measures Data With Application to External Control Arms in Open-Label Extension Studies 重复测量数据的多元贝叶斯动态借用及其在开放标签扩展研究中的应用。
IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-07 DOI: 10.1002/bimj.70079
Benjamin F. Hartley, Matthew A. Psioda, Adrian P. Mander

Borrowing analyses are increasingly important in clinical trials. We develop a method for using robust mixture priors in multivariate dynamic borrowing. The method was motivated by a desire to produce causally valid, long-term treatment effect estimates of a continuous endpoint from a single active-arm open-label extension study following a randomized clinical trial by dynamically incorporating prior beliefs from a long-term external control arm. The proposed method is a generally applicable Bayesian dynamic borrowing analysis for estimates of multivariate summary metrics based on a multivariate normal likelihood function for various parameter models, some of which we describe. There are important connections to estimation incorporating a prior belief for a hypothetical estimand strategy, that is, had the event not occurred, for intercurrent events which lead to missing data.

借用分析在临床试验中越来越重要。提出了一种在多元动态借贷中使用鲁棒混合先验的方法。该方法的动机是希望在随机临床试验之后,通过动态地结合长期外部对照组的先验信念,对单个主动臂开放标签扩展研究的连续终点进行因果有效的长期治疗效果估计。本文提出的方法是一种基于多元正态似然函数的多元汇总指标估计的贝叶斯动态借用分析方法,适用于各种参数模型,我们描述了其中的一些。对于一个假设的估计策略,也就是说,如果事件没有发生,对于导致丢失数据的交互事件,与纳入先验信念的估计有重要的联系。
{"title":"Multivariate Bayesian Dynamic Borrowing for Repeated Measures Data With Application to External Control Arms in Open-Label Extension Studies","authors":"Benjamin F. Hartley,&nbsp;Matthew A. Psioda,&nbsp;Adrian P. Mander","doi":"10.1002/bimj.70079","DOIUrl":"10.1002/bimj.70079","url":null,"abstract":"<p>Borrowing analyses are increasingly important in clinical trials. We develop a method for using robust mixture priors in multivariate dynamic borrowing. The method was motivated by a desire to produce causally valid, long-term treatment effect estimates of a continuous endpoint from a single active-arm open-label extension study following a randomized clinical trial by dynamically incorporating prior beliefs from a long-term external control arm. The proposed method is a generally applicable Bayesian dynamic borrowing analysis for estimates of multivariate summary metrics based on a multivariate normal likelihood function for various parameter models, some of which we describe. There are important connections to estimation incorporating a prior belief for a hypothetical estimand strategy, that is, had the event not occurred, for intercurrent events which lead to missing data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 5","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrical Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1