Integrated analysis of multi-omics datasets holds great promise for uncovering complex biological processes. However, the large dimensionality of omics data poses significant interpretability and multiple testing challenges. Simultaneous enrichment analysis (SEA) was introduced to address these issues in single-omics analysis, providing an in-built multiple testing correction and enabling simultaneous feature set testing. In this article, we introduce OCEAN, an extension of SEA to multi-omics data. OCEAN is a flexible approach to analyze potentially all possible two-way feature sets from any pair of genomics datasets. We also propose two new error rates which are in line with the two-way structure of the data and facilitate interpretation of the results. The power and utility of OCEAN are demonstrated by analyzing copy number and gene expression data for breast and colon cancer.
{"title":"Multiple Testing of Mix-and-Match Feature Sets in Multi-Omics.","authors":"Mitra Ebrahimpoor, Renée Menezes, Ningning Xu, Jelle J Goeman","doi":"10.1002/sim.70367","DOIUrl":"10.1002/sim.70367","url":null,"abstract":"<p><p>Integrated analysis of multi-omics datasets holds great promise for uncovering complex biological processes. However, the large dimensionality of omics data poses significant interpretability and multiple testing challenges. Simultaneous enrichment analysis (SEA) was introduced to address these issues in single-omics analysis, providing an in-built multiple testing correction and enabling simultaneous feature set testing. In this article, we introduce OCEAN, an extension of SEA to multi-omics data. OCEAN is a flexible approach to analyze potentially all possible two-way feature sets from any pair of genomics datasets. We also propose two new error rates which are in line with the two-way structure of the data and facilitate interpretation of the results. The power and utility of OCEAN are demonstrated by analyzing copy number and gene expression data for breast and colon cancer.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70367"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12825407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mireille E Schnitzer, Denis Talbot, Yan Liu, David Berger, Guanbo Wang, Jennifer O'Loughlin, Marie-Pierre Sylvestre, Ashkan Ertefaie
Causal variable selection in time-varying treatment settings is challenging due to evolving confounding effects. Existing methods mainly focus on time-fixed exposures and are not directly applicable to time-varying scenarios. We propose a novel two-step procedure for variable selection when modeling the treatment probability at each time point. We first introduce a novel approach to longitudinal confounder selection using a Longitudinal Outcome Adaptive LASSO (LOAL) that will data-adaptively select covariates with theoretical justification of variance reduction of the estimator of the causal effect. We then propose an adaptive fused LASSO that can collapse treatment model parameters over time points with the goal of simplifying the models in order to improve the efficiency of the estimator while minimizing model misspecification bias compared with naive pooled logistic regression models. Our simulation studies highlight the need for and usefulness of the proposed approach in practice. We implemented our method on data from the Nicotine Dependence in Teens study to estimate the effect of the timing of alcohol initiation during adolescence on depressive symptoms in early adulthood.
{"title":"Adaptive Sparsening and Smoothing of the Treatment Model for Longitudinal Causal Inference Using Outcome-Adaptive LASSO and Marginal Fused LASSO.","authors":"Mireille E Schnitzer, Denis Talbot, Yan Liu, David Berger, Guanbo Wang, Jennifer O'Loughlin, Marie-Pierre Sylvestre, Ashkan Ertefaie","doi":"10.1002/sim.70316","DOIUrl":"10.1002/sim.70316","url":null,"abstract":"<p><p>Causal variable selection in time-varying treatment settings is challenging due to evolving confounding effects. Existing methods mainly focus on time-fixed exposures and are not directly applicable to time-varying scenarios. We propose a novel two-step procedure for variable selection when modeling the treatment probability at each time point. We first introduce a novel approach to longitudinal confounder selection using a Longitudinal Outcome Adaptive LASSO (LOAL) that will data-adaptively select covariates with theoretical justification of variance reduction of the estimator of the causal effect. We then propose an adaptive fused LASSO that can collapse treatment model parameters over time points with the goal of simplifying the models in order to improve the efficiency of the estimator while minimizing model misspecification bias compared with naive pooled logistic regression models. Our simulation studies highlight the need for and usefulness of the proposed approach in practice. We implemented our method on data from the Nicotine Dependence in Teens study to estimate the effect of the timing of alcohol initiation during adolescence on depressive symptoms in early adulthood.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70316"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Tong, Chao Cheng, Guangyu Tong, Michael O Harhay, Fan Li
In clinical trials, the observation of participant outcomes may frequently be hindered by death, leading to ambiguity in defining a scientifically meaningful final outcome for those who die. Principal stratification methods are valuable tools for addressing the average causal effect among always-survivors, that is, the average treatment effect among a subpopulation defined as those who would survive regardless of treatment assignment. Although robust methods for the truncation-by-death problem in two-arm clinical trials have been previously studied, their expansion to multi-arm clinical trials remains elusive. In this article, we study the identification of a class of survivor average causal effect estimands with multiple treatments under monotonicity and principal ignorability, and first propose simple weighting and regression approaches for point estimation. As a further improvement, we derive the efficient influence function to motivate doubly robust estimators for the survivor average causal effects in multi-arm clinical trials. We also propose sensitivity methods under violations of key causal assumptions. Extensive simulations are conducted to investigate the finite-sample performance of the proposed methods against the existing methods, and a real data example is used to illustrate how to operationalize the proposed estimators and the sensitivity methods in practice.
{"title":"Doubly Robust Estimation and Sensitivity Analysis With Outcomes Truncated by Death in Multi-Arm Clinical Trials.","authors":"Jiaqi Tong, Chao Cheng, Guangyu Tong, Michael O Harhay, Fan Li","doi":"10.1002/sim.70297","DOIUrl":"https://doi.org/10.1002/sim.70297","url":null,"abstract":"<p><p>In clinical trials, the observation of participant outcomes may frequently be hindered by death, leading to ambiguity in defining a scientifically meaningful final outcome for those who die. Principal stratification methods are valuable tools for addressing the average causal effect among always-survivors, that is, the average treatment effect among a subpopulation defined as those who would survive regardless of treatment assignment. Although robust methods for the truncation-by-death problem in two-arm clinical trials have been previously studied, their expansion to multi-arm clinical trials remains elusive. In this article, we study the identification of a class of survivor average causal effect estimands with multiple treatments under monotonicity and principal ignorability, and first propose simple weighting and regression approaches for point estimation. As a further improvement, we derive the efficient influence function to motivate doubly robust estimators for the survivor average causal effects in multi-arm clinical trials. We also propose sensitivity methods under violations of key causal assumptions. Extensive simulations are conducted to investigate the finite-sample performance of the proposed methods against the existing methods, and a real data example is used to illustrate how to operationalize the proposed estimators and the sensitivity methods in practice.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70297"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Covariate measurement error is an important problem in survival analysis, which has been well studied under the Cox proportional hazards model. However, measurement error effects have been rarely addressed under the Aalen's additive hazards model, and there is a lack of methods to correct for error effects. In recent years, the Aalen's additive hazards model has been increasingly used in causal mediation analysis. Although the longitudinal mediator is frequently measured with uncertainty, the issue of measurement error in the mediator has received little attention. In this article, we study the general problem of covariate measurement error under the Aalen's additive hazards model and propose a measurement error correction strategy. We then extend the proposed method to causal mediation analysis in the survival setting with an error-prone longitudinal mediator. Corrected estimation of the direct and indirect effects is obtained. The performance of the proposed method is assessed in numerical studies.
{"title":"Survival Analysis Under the Aalen's Additive Hazards Model With Covariate Measurement Error: Application to Causal Mediation Analysis.","authors":"Xialing Wen, Liangchen Qin, Hui Wu, Ying Yan","doi":"10.1002/sim.70346","DOIUrl":"https://doi.org/10.1002/sim.70346","url":null,"abstract":"<p><p>Covariate measurement error is an important problem in survival analysis, which has been well studied under the Cox proportional hazards model. However, measurement error effects have been rarely addressed under the Aalen's additive hazards model, and there is a lack of methods to correct for error effects. In recent years, the Aalen's additive hazards model has been increasingly used in causal mediation analysis. Although the longitudinal mediator is frequently measured with uncertainty, the issue of measurement error in the mediator has received little attention. In this article, we study the general problem of covariate measurement error under the Aalen's additive hazards model and propose a measurement error correction strategy. We then extend the proposed method to causal mediation analysis in the survival setting with an error-prone longitudinal mediator. Corrected estimation of the direct and indirect effects is obtained. The performance of the proposed method is assessed in numerical studies.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70346"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145701538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina M Montoya, Elvin H Geng, Michael Valancius, Michael R Kosorok, Maya L Petersen
We propose a novel causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, we consider a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment. We define the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. We show how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. We present simulation studies that evaluate the performance of this estimator under various finite-sample scenarios. Throughout, we use the "Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care" trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who had an increase in benefit from them initially.
{"title":"Effects Among the Affected.","authors":"Lina M Montoya, Elvin H Geng, Michael Valancius, Michael R Kosorok, Maya L Petersen","doi":"10.1002/sim.70353","DOIUrl":"10.1002/sim.70353","url":null,"abstract":"<p><p>We propose a novel causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, we consider a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment. We define the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. We show how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. We present simulation studies that evaluate the performance of this estimator under various finite-sample scenarios. Throughout, we use the \"Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care\" trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who had an increase in benefit from them initially.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70353"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyi Zhang, Tuo Wang, Yongming Qu, Fangrong Yan, Suyu Liu, Ruitao Lin
Decentralized clinical trials (DCTs) extend trial activities beyond traditional sites, enhancing access, convenience, efficiency, and result generalizability. They are particularly promising for chronic conditions like diabetes and obesity, which require longer study durations to evaluate drug effects. However, decentralized data collection raises concerns about increased variability and potential biases. This paper presents a novel Bayesian integrated learning procedure to analyze dose-response relationships using longitudinal data from a phase II DCT that combines centralized and decentralized data collection. We generalize a parametric exponential decay model to handle mixed data sources and apply Bayesian spike-and-slab priors to address biases and uncertainties from decentralized measurements. Our model enables data-adaptive integration of information from both centralized and decentralized sources. Through simulations and sensitivity analyses, we show that the proposed approach achieves favorable performance across various scenarios. Notably, the method matches the efficiency of traditional trials when decentralized data collection introduces no additional variability or error. Even when such issues arise, it remains less biased and more efficient than naïve methods that rely solely on centralized data or simply pool data from both sources.
{"title":"Bayesian Integrated Learning of Longitudinal Dose-Response Relationships via Decentralized Clinical Trials.","authors":"Jingyi Zhang, Tuo Wang, Yongming Qu, Fangrong Yan, Suyu Liu, Ruitao Lin","doi":"10.1002/sim.70338","DOIUrl":"10.1002/sim.70338","url":null,"abstract":"<p><p>Decentralized clinical trials (DCTs) extend trial activities beyond traditional sites, enhancing access, convenience, efficiency, and result generalizability. They are particularly promising for chronic conditions like diabetes and obesity, which require longer study durations to evaluate drug effects. However, decentralized data collection raises concerns about increased variability and potential biases. This paper presents a novel Bayesian integrated learning procedure to analyze dose-response relationships using longitudinal data from a phase II DCT that combines centralized and decentralized data collection. We generalize a parametric exponential decay model to handle mixed data sources and apply Bayesian spike-and-slab priors to address biases and uncertainties from decentralized measurements. Our model enables data-adaptive integration of information from both centralized and decentralized sources. Through simulations and sensitivity analyses, we show that the proposed approach achieves favorable performance across various scenarios. Notably, the method matches the efficiency of traditional trials when decentralized data collection introduces no additional variability or error. Even when such issues arise, it remains less biased and more efficient than naïve methods that rely solely on centralized data or simply pool data from both sources.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70338"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12675892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145669375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cluster randomized trials (CRTs) may be analyzed using cluster-level analyses. For binary outcomes, proportions are estimated for each cluster, and a risk difference can be estimated. The confidence interval is estimated using a Student distribution. However, in doing so, individual-level characteristics are not adjusted for even though CRTs are known to be prone to recruitment/identification bias, possibly implying individual-level confounders. With a simulation study, we compared cluster-level analyses to estimate a risk difference for a two-arm parallel CRT with individual-level confounders and cluster-level covariates. We considered the unadjusted (UN) method, two two-stage procedure (TSP) methods considering a binomial or a Gaussian distribution, G-computation (GC), and targeted maximum likelihood estimation (TMLE) methods. As expected, the UN method was biased. TSP methods were also biased for scenarios with a treatment effect when the number of clusters per arm was small. GC and TMLE methods were unbiased. For these latter methods, adjustment on only individual-level covariates led to better performance measures (type I error rate, coverage rate and relative error of the standard error) than adjustment on both individual- and cluster-level covariates. TSP, GC and TMLE had very similar results except in scenarios with a small number of clusters: Biased results for TSP methods and convergence problems for GC methods. In this case, TMLE should be preferred.
{"title":"Cluster-Level Analyses to Estimate a Risk Difference in a Cluster Randomized Trial With Confounding Individual-Level Covariates: A Simulation Study.","authors":"Jules Antoine Pereira Macedo, Bruno Giraudeau","doi":"10.1002/sim.70341","DOIUrl":"https://doi.org/10.1002/sim.70341","url":null,"abstract":"<p><p>Cluster randomized trials (CRTs) may be analyzed using cluster-level analyses. For binary outcomes, proportions are estimated for each cluster, and a risk difference can be estimated. The confidence interval is estimated using a Student distribution. However, in doing so, individual-level characteristics are not adjusted for even though CRTs are known to be prone to recruitment/identification bias, possibly implying individual-level confounders. With a simulation study, we compared cluster-level analyses to estimate a risk difference for a two-arm parallel CRT with individual-level confounders and cluster-level covariates. We considered the unadjusted (UN) method, two two-stage procedure (TSP) methods considering a binomial or a Gaussian distribution, G-computation (GC), and targeted maximum likelihood estimation (TMLE) methods. As expected, the UN method was biased. TSP methods were also biased for scenarios with a treatment effect when the number of clusters per arm was small. GC and TMLE methods were unbiased. For these latter methods, adjustment on only individual-level covariates led to better performance measures (type I error rate, coverage rate and relative error of the standard error) than adjustment on both individual- and cluster-level covariates. TSP, GC and TMLE had very similar results except in scenarios with a small number of clusters: Biased results for TSP methods and convergence problems for GC methods. In this case, TMLE should be preferred.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70341"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145669473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In multi-arm clinical trials, several new treatments are often evaluated concurrently to identify the best and confirm their superiority over a control. In this paper, we propose a framework that introduces an intermediate stage aimed at assessing the collective efficacy of treatments retained after initial screening. Estimating the average effect of the selected treatments provides an interpretable measure of their collective potential and serves as a data-driven criterion for deciding whether to continue or terminate the trial. Consider experimental treatments whose effects are described by independent Gaussian responses with unknown means and a common variance. For the purpose of selecting the effective treatments (drugs) and estimating their average worth, we employ a two-stage drop-the-losers design (DLD). To get an idea about the structure of an optimal estimator, we first assume that the common variance is known. In the first stage of the design, data is collected to select a subset of experimental treatments so that the probability of including the best treatment is at least a prespecified level . This selection rule ensures that inferior treatments are eliminated while maintaining a minimum confidence that the best treatment remains among those advanced. Given this requirement, the design either advances all selected treatments to the next stage or stops for futility. The treatment(s) selected in the subset then proceed to the second stage for estimating their collective effectiveness through point estimation of their average worth, defined as the arithmetic average of their mean effects. Since the bias of estimators is crucial in clinical studies, we derive the uniformly minimum variance conditionally unbiased estimator (UMVCUE) of the worth of the selected treatments, conditioned on the indices of treatments selected at the first stage. The mean squared error and bias performances of the UMVCUE are compared with the naive estimator (maximum likelihood estimator) via a simulation study. For the unknown variance scenario, we propose a plug-in estimator based on the structure of the UMVCUE derived for the known variance case and study its performance through simulations. A real-life data example is also provided to illustrate an application of our findings.
{"title":"Two-Stage Drop-the-Losers Design for the Selection of Effective Treatments and Estimating Their Average Worth.","authors":"Yogesh Katariya, Neeraj Misra","doi":"10.1002/sim.70344","DOIUrl":"https://doi.org/10.1002/sim.70344","url":null,"abstract":"<p><p>In multi-arm clinical trials, several new treatments are often evaluated concurrently to identify the best and confirm their superiority over a control. In this paper, we propose a framework that introduces an intermediate stage aimed at assessing the collective efficacy of treatments retained after initial screening. Estimating the average effect of the selected treatments provides an interpretable measure of their collective potential and serves as a data-driven criterion for deciding whether to continue or terminate the trial. Consider <math> <semantics><mrow><mi>k</mi> <mspace></mspace> <mo>(</mo> <mo>≥</mo> <mn>2</mn> <mo>)</mo></mrow> <annotation>$$ kkern0.3em left(ge 2right) $$</annotation></semantics> </math> experimental treatments whose effects are described by independent Gaussian responses with unknown means and a common variance. For the purpose of selecting the effective treatments (drugs) and estimating their average worth, we employ a two-stage drop-the-losers design (DLD). To get an idea about the structure of an optimal estimator, we first assume that the common variance is known. In the first stage of the design, data is collected to select a subset of experimental treatments so that the probability of including the best treatment is at least a prespecified level <math> <semantics> <mrow> <msup><mrow><mi>P</mi></mrow> <mrow><mo>∗</mo></mrow> </msup> </mrow> <annotation>$$ {P}^{ast } $$</annotation></semantics> </math> . This selection rule ensures that inferior treatments are eliminated while maintaining a minimum confidence that the best treatment remains among those advanced. Given this requirement, the design either advances all selected treatments to the next stage or stops for futility. The treatment(s) selected in the subset then proceed to the second stage for estimating their collective effectiveness through point estimation of their average worth, defined as the arithmetic average of their mean effects. Since the bias of estimators is crucial in clinical studies, we derive the uniformly minimum variance conditionally unbiased estimator (UMVCUE) of the worth of the selected treatments, conditioned on the indices of treatments selected at the first stage. The mean squared error and bias performances of the UMVCUE are compared with the naive estimator (maximum likelihood estimator) via a simulation study. For the unknown variance scenario, we propose a plug-in estimator based on the structure of the UMVCUE derived for the known variance case and study its performance through simulations. A real-life data example is also provided to illustrate an application of our findings.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70344"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145687921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Precision medicine relies on accurate and interpretable predictive models to identify patient subgroups and biomarkers that can guide individualized treatment strategies. While extreme gradient boosting (XGBoost) often achieves state-of-the-art predictive performance, its complexity can impede understanding of how input variables influence outcomes. Building upon existing XGBoost frameworks for estimating individualized treatment rule (ITR), we introduce a global permutation test within this framework to assess treatment effect heterogeneity. Additionally, we incorporate two model-agnostic explanation techniques, local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP), to enhance interpretability at both global and individual levels. Through simulations and analyses of real-world clinical trial datasets, we illustrate that our permutation-based pipeline can detect empirical signals of treatment effect heterogeneity, while LIME and SHAP offer exploratory insights into feature contributions and ITR.
{"title":"Explaining Individualized Treatment Rules: Integrating LIME and SHAP With Xgboost in Precision Medicine.","authors":"Zihuan Liu, Xin Huang","doi":"10.1002/sim.70322","DOIUrl":"https://doi.org/10.1002/sim.70322","url":null,"abstract":"<p><p>Precision medicine relies on accurate and interpretable predictive models to identify patient subgroups and biomarkers that can guide individualized treatment strategies. While extreme gradient boosting (XGBoost) often achieves state-of-the-art predictive performance, its complexity can impede understanding of how input variables influence outcomes. Building upon existing XGBoost frameworks for estimating individualized treatment rule (ITR), we introduce a global permutation test within this framework to assess treatment effect heterogeneity. Additionally, we incorporate two model-agnostic explanation techniques, local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP), to enhance interpretability at both global and individual levels. Through simulations and analyses of real-world clinical trial datasets, we illustrate that our permutation-based pipeline can detect empirical signals of treatment effect heterogeneity, while LIME and SHAP offer exploratory insights into feature contributions and ITR.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70322"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avi Kenny, Lars van der Laan, Peter Gilbert, Marco Carone
In vaccine research, it is important to identify biomarkers that can reliably predict vaccine efficacy against a clinical endpoint. Such biomarkers are known as immune correlates of protection (CoP) and can serve as surrogate endpoints in vaccine efficacy trials to accelerate the approval process. CoPs must be rigorously validated, and one method of doing so is through the controlled risk (CR) curve, a function that represents the causal effect of the biomarker on population-level risk of experiencing the endpoint of interest by a certain time post-vaccination. The CR curve can be estimated by leveraging a Cox proportional hazards model, but researchers currently rely on the bootstrap for inference, which can be computationally demanding. In this article, we analytically derive the asymptotic variance of this estimator, providing an analytic approach for constructing both pointwise and uniform confidence bands. We evaluate the finite sample performance of these methods in a simulation study and illustrate their use on data from the Coronavirus Efficacy (COVE) placebo-controlled phase 3 trial (NCT04470427) of the mRNA-1273 COVID-19 vaccine.
{"title":"Inference on Controlled Effects for Assessing Immune Correlates of Protection Based on a Cox Model.","authors":"Avi Kenny, Lars van der Laan, Peter Gilbert, Marco Carone","doi":"10.1002/sim.70347","DOIUrl":"10.1002/sim.70347","url":null,"abstract":"<p><p>In vaccine research, it is important to identify biomarkers that can reliably predict vaccine efficacy against a clinical endpoint. Such biomarkers are known as immune correlates of protection (CoP) and can serve as surrogate endpoints in vaccine efficacy trials to accelerate the approval process. CoPs must be rigorously validated, and one method of doing so is through the controlled risk (CR) curve, a function that represents the causal effect of the biomarker on population-level risk of experiencing the endpoint of interest by a certain time post-vaccination. The CR curve can be estimated by leveraging a Cox proportional hazards model, but researchers currently rely on the bootstrap for inference, which can be computationally demanding. In this article, we analytically derive the asymptotic variance of this estimator, providing an analytic approach for constructing both pointwise and uniform confidence bands. We evaluate the finite sample performance of these methods in a simulation study and illustrate their use on data from the Coronavirus Efficacy (COVE) placebo-controlled phase 3 trial (NCT04470427) of the mRNA-1273 COVID-19 vaccine.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70347"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145715815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}