Wanni Lei, Maosen Peng, Nasser Altorki, Xi Kathy Zhou
Phase II clinical trials play a pivotal role in drug development by screening a large number of drug candidates to identify those with promising preliminary efficacy for phase III testing. Trial designs that enable efficient decision-making with small sample sizes and early futility stopping while controlling for type I and type II errors in hypothesis testing, such as Simon's two-stage design, are preferred. Randomized multi-arm trials are increasingly used in phase II settings to overcome the limitations associated with using historical controls as the reference. However, how to effectively balance efficiency and accurate decision-making continues to be an important research topic. A notable development in phase II randomized design methodology is the Bayesian pick-the-winner (BPW) design proposed by Chen et al. [1]. Despite multiple appealing features, this method cannot easily control for overall type I and type II errors for winner selection. Here, we introduce an improved randomized two-stage Bayesian pick-the-winner (IBPW) design that formalizes the winner-selection based hypothesis testing, optimizes sample sizes and decision cut-offs by strictly controlling the type I and type II errors under a set of flexible hypotheses for winner-selection across two treatment arms. Simulation studies demonstrate that our new design offers improved operating characteristics for winner selection while retaining the desirable features of the BPW design.
{"title":"An Improved Bayesian Pick-the-Winner (IBPW) Design for Randomized Phase II Clinical Trials.","authors":"Wanni Lei, Maosen Peng, Nasser Altorki, Xi Kathy Zhou","doi":"10.1002/sim.70348","DOIUrl":"10.1002/sim.70348","url":null,"abstract":"<p><p>Phase II clinical trials play a pivotal role in drug development by screening a large number of drug candidates to identify those with promising preliminary efficacy for phase III testing. Trial designs that enable efficient decision-making with small sample sizes and early futility stopping while controlling for type I and type II errors in hypothesis testing, such as Simon's two-stage design, are preferred. Randomized multi-arm trials are increasingly used in phase II settings to overcome the limitations associated with using historical controls as the reference. However, how to effectively balance efficiency and accurate decision-making continues to be an important research topic. A notable development in phase II randomized design methodology is the Bayesian pick-the-winner (BPW) design proposed by Chen et al. [1]. Despite multiple appealing features, this method cannot easily control for overall type I and type II errors for winner selection. Here, we introduce an improved randomized two-stage Bayesian pick-the-winner (IBPW) design that formalizes the winner-selection based hypothesis testing, optimizes sample sizes and decision cut-offs by strictly controlling the type I and type II errors under a set of flexible hypotheses for winner-selection across two treatment arms. Simulation studies demonstrate that our new design offers improved operating characteristics for winner selection while retaining the desirable features of the BPW design.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70348"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Svensson, Erik Hermansson, Nikolaos Nikolaou, Konstantinos Sechidis, Ilya Lipkovich
In recent years, two parallel research trends have emerged in machine learning, yet their intersections remain largely unexplored. On one hand, there has been a significant increase in literature focused on Individual Treatment Effect (ITE) modeling, particularly targeting the Conditional Average Treatment Effect (CATE) using meta-learner techniques. These approaches often aim to identify causal effects from observational data. On the other hand, the field of Explainable Machine Learning (XML) has gained traction, with various approaches developed to explain complex models and make their predictions more interpretable. A prominent technique in this area is Shapley Additive Explanations (SHAP), which has become mainstream in data science for analyzing supervised learning models. However, there has been limited exploration of SHAP's application in identifying predictive biomarkers through CATE models, a crucial aspect in pharmaceutical precision medicine. We address inherent challenges associated with the SHAP concept in multi-stage CATE strategies and introduce a surrogate estimation approach that is agnostic to the choice of CATE strategy, effectively reducing computational burdens in high-dimensional data. Using this approach, we conduct simulation benchmarking to evaluate the ability to accurately identify biomarkers using SHAP values derived from various CATE meta-learners and Causal Forest.
{"title":"Overview and Practical Recommendations on Using Shapley Values for Identifying Predictive Biomarkers via CATE Modeling.","authors":"David Svensson, Erik Hermansson, Nikolaos Nikolaou, Konstantinos Sechidis, Ilya Lipkovich","doi":"10.1002/sim.70375","DOIUrl":"10.1002/sim.70375","url":null,"abstract":"<p><p>In recent years, two parallel research trends have emerged in machine learning, yet their intersections remain largely unexplored. On one hand, there has been a significant increase in literature focused on Individual Treatment Effect (ITE) modeling, particularly targeting the Conditional Average Treatment Effect (CATE) using meta-learner techniques. These approaches often aim to identify causal effects from observational data. On the other hand, the field of Explainable Machine Learning (XML) has gained traction, with various approaches developed to explain complex models and make their predictions more interpretable. A prominent technique in this area is Shapley Additive Explanations (SHAP), which has become mainstream in data science for analyzing supervised learning models. However, there has been limited exploration of SHAP's application in identifying predictive biomarkers through CATE models, a crucial aspect in pharmaceutical precision medicine. We address inherent challenges associated with the SHAP concept in multi-stage CATE strategies and introduce a surrogate estimation approach that is agnostic to the choice of CATE strategy, effectively reducing computational burdens in high-dimensional data. Using this approach, we conduct simulation benchmarking to evaluate the ability to accurately identify biomarkers using SHAP values derived from various CATE meta-learners and Causal Forest.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70375"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee
This research is motivated by integrated epidemiological and blood biomarker studies, investigating the relationship between long-term adherence to a Mediterranean diet and cardiometabolic health, with plasma metabolomes as potential mediators. Analyzing causal mediation in high-dimensional omics data presents challenges, including complex dependencies among mediators and the need for advanced regularization or Bayesian techniques to ensure stable and interpretable estimation and selection of indirect effects. To this end, we propose a novel Bayesian framework to identify active pathways and estimate indirect effects in high-dimensional mediation analysis. Central to our method is the introduction of a set of priors for the selection indicators in the mediator and outcome models. A Markov random field prior leverages mediator correlations, enhancing power in detecting mediated effects. Sequential subsetting priors encourage simultaneous selection of relevant mediators and their indirect effects, ensuring a more coherent and efficient variable selection framework. Comprehensive simulation studies demonstrate that the proposed method provides superior power in detecting active mediating pathways. We further illustrate the practical utility of the method by applying it to metabolome data from two sub-studies within the Health Professionals Follow-up Study and Nurses' Health Study II, highlighting its effectiveness in a real-data setting.
{"title":"Bayesian Variable Selection for High-Dimensional Mediation Analysis: Application to Metabolomics Data in Epidemiological Studies.","authors":"Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee","doi":"10.1002/sim.70365","DOIUrl":"10.1002/sim.70365","url":null,"abstract":"<p><p>This research is motivated by integrated epidemiological and blood biomarker studies, investigating the relationship between long-term adherence to a Mediterranean diet and cardiometabolic health, with plasma metabolomes as potential mediators. Analyzing causal mediation in high-dimensional omics data presents challenges, including complex dependencies among mediators and the need for advanced regularization or Bayesian techniques to ensure stable and interpretable estimation and selection of indirect effects. To this end, we propose a novel Bayesian framework to identify active pathways and estimate indirect effects in high-dimensional mediation analysis. Central to our method is the introduction of a set of priors for the selection indicators in the mediator and outcome models. A Markov random field prior leverages mediator correlations, enhancing power in detecting mediated effects. Sequential subsetting priors encourage simultaneous selection of relevant mediators and their indirect effects, ensuring a more coherent and efficient variable selection framework. Comprehensive simulation studies demonstrate that the proposed method provides superior power in detecting active mediating pathways. We further illustrate the practical utility of the method by applying it to metabolome data from two sub-studies within the Health Professionals Follow-up Study and Nurses' Health Study II, highlighting its effectiveness in a real-data setting.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70365"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generalizing causal findings, such as the average treatment effect (ATE), from a source to a target population is a critical topic in biomedical research. Differences in the distributions of treatment effect modifiers between these populations, known as covariate shift, can lead to varying ATEs. Chen et al. [1] introduced a weighting method to estimate the target ATE using only summary-level information from a target sample while accounting for the possible covariate shifts. However, the asymptotic variance of the estimate was shown to depend on individual-level data from the target sample, hindering statistical inference. In this article, we propose a resampling-based perturbation method for confidence interval construction for the estimated target ATE, utilizing additional summary-level information. We demonstrate the effectiveness of our approach through simulation and real data settings when only summary-level information is available.
{"title":"Confidence Interval Construction for Causally Generalized Estimates With Target Sample Summary Information.","authors":"Yi Chen, Guanhua Chen, Menggang Yu","doi":"10.1002/sim.70358","DOIUrl":"10.1002/sim.70358","url":null,"abstract":"<p><p>Generalizing causal findings, such as the average treatment effect (ATE), from a source to a target population is a critical topic in biomedical research. Differences in the distributions of treatment effect modifiers between these populations, known as covariate shift, can lead to varying ATEs. Chen et al. [1] introduced a weighting method to estimate the target ATE using only summary-level information from a target sample while accounting for the possible covariate shifts. However, the asymptotic variance of the estimate was shown to depend on individual-level data from the target sample, hindering statistical inference. In this article, we propose a resampling-based perturbation method for confidence interval construction for the estimated target ATE, utilizing additional summary-level information. We demonstrate the effectiveness of our approach through simulation and real data settings when only summary-level information is available.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70358"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826351/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For a chronic disease, besides the treatment induction effect, it is also important to demonstrate the maintenance effect of long-term treatment use. To fulfill these and other objectives for a clinical study, we often apply one of three designs: the active treatment lead-in followed by randomized maintenance design, the randomized induction followed by re-randomized withdrawal maintenance design and the treat-through design (FDA 2022). Separately, a two-stage sequential parallel comparison design (SPCD) is frequently used in therapeutic areas where placebo has a large effect. In this paper, we use a SPCD for a clinical trial with a binary endpoint for induction, maintenance, long-term and other treatment effect assessments. This SPCD can actually be treated as a hybrid of the above three designs and has some additional advantages. For example, compared to the re-randomized withdrawal maintenance design, the SPCD does not need a re-randomization to simplify trial operation and it also provides controlled data for formal long-term efficacy and safety analyses. To fully utilize all available data of the two stages for an overall treatment effect evaluation, a weighted combination test is considered with the incorporation of correlations of the components. Further, a multiple imputation approach is applied to handle missing not at random data. Simulations are conducted to evaluate the performances of the methods and a data example is employed to illustrate the applications of the methods.
{"title":"Sequential Parallel Comparison Design for Assessing Induction, Maintenance, Long-Term, and Other Treatment Effects on a Binary Endpoint.","authors":"Hui Quan, Zhixing Xu, Xun Chen","doi":"10.1002/sim.70382","DOIUrl":"https://doi.org/10.1002/sim.70382","url":null,"abstract":"<p><p>For a chronic disease, besides the treatment induction effect, it is also important to demonstrate the maintenance effect of long-term treatment use. To fulfill these and other objectives for a clinical study, we often apply one of three designs: the active treatment lead-in followed by randomized maintenance design, the randomized induction followed by re-randomized withdrawal maintenance design and the treat-through design (FDA 2022). Separately, a two-stage sequential parallel comparison design (SPCD) is frequently used in therapeutic areas where placebo has a large effect. In this paper, we use a SPCD for a clinical trial with a binary endpoint for induction, maintenance, long-term and other treatment effect assessments. This SPCD can actually be treated as a hybrid of the above three designs and has some additional advantages. For example, compared to the re-randomized withdrawal maintenance design, the SPCD does not need a re-randomization to simplify trial operation and it also provides controlled data for formal long-term efficacy and safety analyses. To fully utilize all available data of the two stages for an overall treatment effect evaluation, a weighted combination test is considered with the incorporation of correlations of the components. Further, a multiple imputation approach is applied to handle missing not at random data. Simulations are conducted to evaluate the performances of the methods and a data example is employed to illustrate the applications of the methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70382"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical inference in multicenter clinical trials is often compromised when relying on asymptotic normal approximations, particularly in designs characterized by a small number of centers or severe imbalance in patient enrollment. Such deviations from asymptotic assumptions frequently result in unreliable p-values and a breakdown of error control. To resolve this, we introduce a high-precision saddlepoint approximation framework for aggregate permutation tests within hierarchically structured data. The theoretical core of our approach is the derivation of a multilevel nested cumulant generating function that explicitly models the trial hierarchy, analytically integrating patient-level linear rank statistics with the stochastic aggregation process across centers. A significant innovation of this work is the extension to the bivariate setting to address co-primary endpoints, providing a robust inferential solution for mixed continuous (efficacy) and discrete (safety) outcomes where standard multivariate normality is unattainable. The resulting framework yields simulation-free, highly accurate tail probabilities even in finite-sample regimes. Extensive simulation studies confirm that our method maintains strict Type I error control in scenarios where asymptotic methods exhibit substantial inflation. Furthermore, an application to the multicenter diabetes prevention program trial demonstrates the method's practical utility: it correctly identifies a significant cardiovascular risk factor that standard approximations failed to detect, thereby preventing a critical Type II error and ensuring valid clinical conclusions.
{"title":"A Saddlepoint Framework for Accurate Inference in Multicenter Clinical Trials With Imbalanced Clusters.","authors":"Haidy A Newer","doi":"10.1002/sim.70408","DOIUrl":"https://doi.org/10.1002/sim.70408","url":null,"abstract":"<p><p>Statistical inference in multicenter clinical trials is often compromised when relying on asymptotic normal approximations, particularly in designs characterized by a small number of centers or severe imbalance in patient enrollment. Such deviations from asymptotic assumptions frequently result in unreliable p-values and a breakdown of error control. To resolve this, we introduce a high-precision saddlepoint approximation framework for aggregate permutation tests within hierarchically structured data. The theoretical core of our approach is the derivation of a multilevel nested cumulant generating function that explicitly models the trial hierarchy, analytically integrating patient-level linear rank statistics with the stochastic aggregation process across centers. A significant innovation of this work is the extension to the bivariate setting to address co-primary endpoints, providing a robust inferential solution for mixed continuous (efficacy) and discrete (safety) outcomes where standard multivariate normality is unattainable. The resulting framework yields simulation-free, highly accurate tail probabilities even in finite-sample regimes. Extensive simulation studies confirm that our method maintains strict Type I error control in scenarios where asymptotic methods exhibit substantial inflation. Furthermore, an application to the multicenter diabetes prevention program trial demonstrates the method's practical utility: it correctly identifies a significant cardiovascular risk factor that standard approximations failed to detect, thereby preventing a critical Type II error and ensuring valid clinical conclusions.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70408"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the pharmaceutical industry, the main utility of futility rules is to allow early stopping of a trial when it seems unlikely to achieve its primary efficacy objectives, and it is mainly motivated by financial and ethical considerations. After a brief overview of available approaches in setting a futility rule, I will illustrate, using a case study, different rules based on conditional power, predictive probability of success, and Bayesian predictive probability of success, and will emphasize the main shortcomings that arise when using these measures, especially in sample size re-estimation designs. I propose, as an alternative, the conditional assurance that is the probability of achieving success at the final analysis when the study was not stopped for futility. It depends on the sample size for the interim, sample size at the final analysis, and the threshold for the futility rule. But it does not need the knowledge of the observed treatment effect estimate at the interim analysis. This makes the conditional assurance very appropriate for building informative futility rules. It balances the probability of stopping for futility (when there is no treatment effect), conditional assurance, and overall power. Decision makers can better understand the levels of risk associated with stopping for futility and make informed decisions about where to spend risk based on what is acceptable to the organization.
{"title":"Informative Futility Rules Based on Conditional Assurance.","authors":"Vladimir Dragalin","doi":"10.1002/sim.70330","DOIUrl":"https://doi.org/10.1002/sim.70330","url":null,"abstract":"<p><p>For the pharmaceutical industry, the main utility of futility rules is to allow early stopping of a trial when it seems unlikely to achieve its primary efficacy objectives, and it is mainly motivated by financial and ethical considerations. After a brief overview of available approaches in setting a futility rule, I will illustrate, using a case study, different rules based on conditional power, predictive probability of success, and Bayesian predictive probability of success, and will emphasize the main shortcomings that arise when using these measures, especially in sample size re-estimation designs. I propose, as an alternative, the conditional assurance that is the probability of achieving success at the final analysis when the study was not stopped for futility. It depends on the sample size for the interim, sample size at the final analysis, and the threshold for the futility rule. But it does not need the knowledge of the observed treatment effect estimate at the interim analysis. This makes the conditional assurance very appropriate for building informative futility rules. It balances the probability of stopping for futility (when there is no treatment effect), conditional assurance, and overall power. Decision makers can better understand the levels of risk associated with stopping for futility and make informed decisions about where to spend risk based on what is acceptable to the organization.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70330"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmeasured confounders pose a major challenge in accurately estimating causal effects in observational studies. To address this issue when estimating hazard ratios (HRs) using Cox proportional hazards models, several methods, including instrumental variables (IVs) approaches, have been proposed. However, these methods often face limitations, such as weak IV problems and restrictive assumptions regarding unmeasured confounder distributions. In this study, we introduce a novel nonparametric Bayesian procedure that provides accurate HR estimates while addressing these limitations. A key assumption of our approach is that unmeasured confounders exhibit a cluster structure. Under this assumption, we integrate two remarkable Bayesian techniques, the Dirichlet process mixture (DPM) and general Bayes (GB), to simultaneously (1) detect latent clusters based on the likelihood of exposure and outcome variables and (2) estimate HRs using the likelihood constructed within each cluster. Notably, leveraging DPM, our procedure eliminates the need for IVs by identifying unmeasured confounders under an alternative condition. Additionally, GB techniques remove the need for explicit modeling of the baseline hazard function, distinguishing our procedure from traditional Bayesian approaches. Simulation experiments demonstrate that the proposed Bayesian procedure outperforms existing methods in some performance metrics. Moreover, it achieves statistical efficiency comparable to the efficient estimator while accurately identifying cluster structures. These features highlight its ability to overcome challenges associated with traditional IV approaches for time-to-event data.
{"title":"Nonparametric Bayesian Adjustment of Unmeasured Confounders in Cox Proportional Hazards Models.","authors":"Shunichiro Orihara, Shonosuke Sugasawa, Tomohiro Ohigashi, Keita Hirano, Tomoyuki Nakagawa, Masataka Taguri","doi":"10.1002/sim.70360","DOIUrl":"10.1002/sim.70360","url":null,"abstract":"<p><p>Unmeasured confounders pose a major challenge in accurately estimating causal effects in observational studies. To address this issue when estimating hazard ratios (HRs) using Cox proportional hazards models, several methods, including instrumental variables (IVs) approaches, have been proposed. However, these methods often face limitations, such as weak IV problems and restrictive assumptions regarding unmeasured confounder distributions. In this study, we introduce a novel nonparametric Bayesian procedure that provides accurate HR estimates while addressing these limitations. A key assumption of our approach is that unmeasured confounders exhibit a cluster structure. Under this assumption, we integrate two remarkable Bayesian techniques, the Dirichlet process mixture (DPM) and general Bayes (GB), to simultaneously (1) detect latent clusters based on the likelihood of exposure and outcome variables and (2) estimate HRs using the likelihood constructed within each cluster. Notably, leveraging DPM, our procedure eliminates the need for IVs by identifying unmeasured confounders under an alternative condition. Additionally, GB techniques remove the need for explicit modeling of the baseline hazard function, distinguishing our procedure from traditional Bayesian approaches. Simulation experiments demonstrate that the proposed Bayesian procedure outperforms existing methods in some performance metrics. Moreover, it achieves statistical efficiency comparable to the efficient estimator while accurately identifying cluster structures. These features highlight its ability to overcome challenges associated with traditional IV approaches for time-to-event data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70360"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clustering longitudinal biomarkers in clinical trials uncovers associations between clinical outcomes, disease progression, and treatment effects. Finite mixtures of multivariate linear mixed-effects (FM-MtLME) models have proven effective for modeling and clustering multiple longitudinal trajectories that exhibit grouped patterns with strong within-group similarity. Motivated by an AIDS study with plasma viral loads measured under assay-specific detection limits, this article extends the FM-MtLME model to account for censored outcomes. The proposed model is called the FM-MtLME with censoring (FM-MtLMEC). To allow covariate-dependent mixing proportions, we further extend it with a logistic link, resulting in the EFM-MtLMEC model. Two efficient EM-based algorithms are developed for parameter estimation of both FM-MtLMEC and EFM-MtLMEC models. The utility of our methods is demonstrated through comprehensive analyses of the AIDS data and simulation studies.
临床试验中的聚类纵向生物标志物揭示了临床结果、疾病进展和治疗效果之间的关联。多元t $$ t $$线性混合效应(FM-MtLME)模型的有限混合已被证明对具有强组内相似性的分组模式的多个纵向轨迹的建模和聚类是有效的。受一项艾滋病研究的启发,在检测特异性检测限下测量血浆病毒载量,本文扩展了FM-MtLME模型,以解释审查结果。该模型被称为带删减的FM-MtLMEC (FM-MtLMEC)。为了允许协变量相关的混合比例,我们用逻辑链接进一步扩展它,从而得到EFM-MtLMEC模型。针对FM-MtLMEC和EFM-MtLMEC模型的参数估计,提出了两种高效的基于em的算法。通过对艾滋病数据的综合分析和模拟研究,证明了我们方法的实用性。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Finite Mixtures of Multivariate <ns0:math> <ns0:semantics><ns0:mrow><ns0:mi>t</ns0:mi></ns0:mrow> <ns0:annotation>$$ t $$</ns0:annotation></ns0:semantics> </ns0:math> Linear Mixed-Effects Models for Censored Longitudinal Data With Concomitant Covariates.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1002/sim.70392","DOIUrl":"10.1002/sim.70392","url":null,"abstract":"<p><p>Clustering longitudinal biomarkers in clinical trials uncovers associations between clinical outcomes, disease progression, and treatment effects. Finite mixtures of multivariate <math> <semantics><mrow><mi>t</mi></mrow> <annotation>$$ t $$</annotation></semantics> </math> linear mixed-effects (FM-MtLME) models have proven effective for modeling and clustering multiple longitudinal trajectories that exhibit grouped patterns with strong within-group similarity. Motivated by an AIDS study with plasma viral loads measured under assay-specific detection limits, this article extends the FM-MtLME model to account for censored outcomes. The proposed model is called the FM-MtLME with censoring (FM-MtLMEC). To allow covariate-dependent mixing proportions, we further extend it with a logistic link, resulting in the EFM-MtLMEC model. Two efficient EM-based algorithms are developed for parameter estimation of both FM-MtLMEC and EFM-MtLMEC models. The utility of our methods is demonstrated through comprehensive analyses of the AIDS data and simulation studies.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70392"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Ratta, Gaëlle Saint-Hilary, Valentine Barboux, Mauro Gasparini, Donia Skanji, Pavel Mozgunov
The urgency of delivering novel, effective treatments against life-threatening diseases has brought various health authorities to allow for Accelerated Approvals (AAs). AA is the "fast track" program where promising treatments are evaluated based on surrogate (short term) endpoints likely to predict clinical benefit. This allows treatments to get an early approval, subject to providing further evidence of efficacy, for example, on the primary (long term) endpoint. Despite this procedure being quite consolidated, a number of conditionally approved treatments do not obtain full approval (FA), mainly due to lack of correlation between surrogate and primary endpoint. This implies a need to improve the criteria for controlling the risk of AAs for noneffective treatments, while maximizing the chance of AAs for effective ones. We first propose a novel adaptive group sequential design that includes an early dual-criterion "Accelerated Approval" interim analysis, where efficacy on a surrogate endpoint is tested jointly with a predictive metric based on the primary endpoint. Secondarily, we explore how the predictive criterion may be strengthened by historical information borrowing, in particular using: (i) historical control data on the primary endpoint, and (ii) the estimated historical relationship between the surrogate and the primary endpoints. We propose various metrics to characterize the risk of correct and incorrect early AAs and demonstrate how the proposed design allows explicit control of these risks, with particular attention to the family-wise error rate (FWER). The methodology is then evaluated through a simulation study motivated by a Phase-III trial in metastatic colorectal cancer (mCRC).
{"title":"Dual-Criterion Approach Incorporating Historical Information to Seek Accelerated Approval With Application in Time-to-Event Group Sequential Trials.","authors":"Marco Ratta, Gaëlle Saint-Hilary, Valentine Barboux, Mauro Gasparini, Donia Skanji, Pavel Mozgunov","doi":"10.1002/sim.70361","DOIUrl":"10.1002/sim.70361","url":null,"abstract":"<p><p>The urgency of delivering novel, effective treatments against life-threatening diseases has brought various health authorities to allow for Accelerated Approvals (AAs). AA is the \"fast track\" program where promising treatments are evaluated based on surrogate (short term) endpoints likely to predict clinical benefit. This allows treatments to get an early approval, subject to providing further evidence of efficacy, for example, on the primary (long term) endpoint. Despite this procedure being quite consolidated, a number of conditionally approved treatments do not obtain full approval (FA), mainly due to lack of correlation between surrogate and primary endpoint. This implies a need to improve the criteria for controlling the risk of AAs for noneffective treatments, while maximizing the chance of AAs for effective ones. We first propose a novel adaptive group sequential design that includes an early dual-criterion \"Accelerated Approval\" interim analysis, where efficacy on a surrogate endpoint is tested jointly with a predictive metric based on the primary endpoint. Secondarily, we explore how the predictive criterion may be strengthened by historical information borrowing, in particular using: (i) historical control data on the primary endpoint, and (ii) the estimated historical relationship between the surrogate and the primary endpoints. We propose various metrics to characterize the risk of correct and incorrect early AAs and demonstrate how the proposed design allows explicit control of these risks, with particular attention to the family-wise error rate (FWER). The methodology is then evaluated through a simulation study motivated by a Phase-III trial in metastatic colorectal cancer (mCRC).</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70361"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}