Unmeasured confounders pose a major challenge in accurately estimating causal effects in observational studies. To address this issue when estimating hazard ratios (HRs) using Cox proportional hazards models, several methods, including instrumental variables (IVs) approaches, have been proposed. However, these methods often face limitations, such as weak IV problems and restrictive assumptions regarding unmeasured confounder distributions. In this study, we introduce a novel nonparametric Bayesian procedure that provides accurate HR estimates while addressing these limitations. A key assumption of our approach is that unmeasured confounders exhibit a cluster structure. Under this assumption, we integrate two remarkable Bayesian techniques, the Dirichlet process mixture (DPM) and general Bayes (GB), to simultaneously (1) detect latent clusters based on the likelihood of exposure and outcome variables and (2) estimate HRs using the likelihood constructed within each cluster. Notably, leveraging DPM, our procedure eliminates the need for IVs by identifying unmeasured confounders under an alternative condition. Additionally, GB techniques remove the need for explicit modeling of the baseline hazard function, distinguishing our procedure from traditional Bayesian approaches. Simulation experiments demonstrate that the proposed Bayesian procedure outperforms existing methods in some performance metrics. Moreover, it achieves statistical efficiency comparable to the efficient estimator while accurately identifying cluster structures. These features highlight its ability to overcome challenges associated with traditional IV approaches for time-to-event data.
{"title":"Nonparametric Bayesian Adjustment of Unmeasured Confounders in Cox Proportional Hazards Models.","authors":"Shunichiro Orihara, Shonosuke Sugasawa, Tomohiro Ohigashi, Keita Hirano, Tomoyuki Nakagawa, Masataka Taguri","doi":"10.1002/sim.70360","DOIUrl":"10.1002/sim.70360","url":null,"abstract":"<p><p>Unmeasured confounders pose a major challenge in accurately estimating causal effects in observational studies. To address this issue when estimating hazard ratios (HRs) using Cox proportional hazards models, several methods, including instrumental variables (IVs) approaches, have been proposed. However, these methods often face limitations, such as weak IV problems and restrictive assumptions regarding unmeasured confounder distributions. In this study, we introduce a novel nonparametric Bayesian procedure that provides accurate HR estimates while addressing these limitations. A key assumption of our approach is that unmeasured confounders exhibit a cluster structure. Under this assumption, we integrate two remarkable Bayesian techniques, the Dirichlet process mixture (DPM) and general Bayes (GB), to simultaneously (1) detect latent clusters based on the likelihood of exposure and outcome variables and (2) estimate HRs using the likelihood constructed within each cluster. Notably, leveraging DPM, our procedure eliminates the need for IVs by identifying unmeasured confounders under an alternative condition. Additionally, GB techniques remove the need for explicit modeling of the baseline hazard function, distinguishing our procedure from traditional Bayesian approaches. Simulation experiments demonstrate that the proposed Bayesian procedure outperforms existing methods in some performance metrics. Moreover, it achieves statistical efficiency comparable to the efficient estimator while accurately identifying cluster structures. These features highlight its ability to overcome challenges associated with traditional IV approaches for time-to-event data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70360"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clustering longitudinal biomarkers in clinical trials uncovers associations between clinical outcomes, disease progression, and treatment effects. Finite mixtures of multivariate linear mixed-effects (FM-MtLME) models have proven effective for modeling and clustering multiple longitudinal trajectories that exhibit grouped patterns with strong within-group similarity. Motivated by an AIDS study with plasma viral loads measured under assay-specific detection limits, this article extends the FM-MtLME model to account for censored outcomes. The proposed model is called the FM-MtLME with censoring (FM-MtLMEC). To allow covariate-dependent mixing proportions, we further extend it with a logistic link, resulting in the EFM-MtLMEC model. Two efficient EM-based algorithms are developed for parameter estimation of both FM-MtLMEC and EFM-MtLMEC models. The utility of our methods is demonstrated through comprehensive analyses of the AIDS data and simulation studies.
临床试验中的聚类纵向生物标志物揭示了临床结果、疾病进展和治疗效果之间的关联。多元t $$ t $$线性混合效应(FM-MtLME)模型的有限混合已被证明对具有强组内相似性的分组模式的多个纵向轨迹的建模和聚类是有效的。受一项艾滋病研究的启发,在检测特异性检测限下测量血浆病毒载量,本文扩展了FM-MtLME模型,以解释审查结果。该模型被称为带删减的FM-MtLMEC (FM-MtLMEC)。为了允许协变量相关的混合比例,我们用逻辑链接进一步扩展它,从而得到EFM-MtLMEC模型。针对FM-MtLMEC和EFM-MtLMEC模型的参数估计,提出了两种高效的基于em的算法。通过对艾滋病数据的综合分析和模拟研究,证明了我们方法的实用性。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Finite Mixtures of Multivariate <ns0:math> <ns0:semantics><ns0:mrow><ns0:mi>t</ns0:mi></ns0:mrow> <ns0:annotation>$$ t $$</ns0:annotation></ns0:semantics> </ns0:math> Linear Mixed-Effects Models for Censored Longitudinal Data With Concomitant Covariates.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1002/sim.70392","DOIUrl":"10.1002/sim.70392","url":null,"abstract":"<p><p>Clustering longitudinal biomarkers in clinical trials uncovers associations between clinical outcomes, disease progression, and treatment effects. Finite mixtures of multivariate <math> <semantics><mrow><mi>t</mi></mrow> <annotation>$$ t $$</annotation></semantics> </math> linear mixed-effects (FM-MtLME) models have proven effective for modeling and clustering multiple longitudinal trajectories that exhibit grouped patterns with strong within-group similarity. Motivated by an AIDS study with plasma viral loads measured under assay-specific detection limits, this article extends the FM-MtLME model to account for censored outcomes. The proposed model is called the FM-MtLME with censoring (FM-MtLMEC). To allow covariate-dependent mixing proportions, we further extend it with a logistic link, resulting in the EFM-MtLMEC model. Two efficient EM-based algorithms are developed for parameter estimation of both FM-MtLMEC and EFM-MtLMEC models. The utility of our methods is demonstrated through comprehensive analyses of the AIDS data and simulation studies.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70392"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Ratta, Gaëlle Saint-Hilary, Valentine Barboux, Mauro Gasparini, Donia Skanji, Pavel Mozgunov
The urgency of delivering novel, effective treatments against life-threatening diseases has brought various health authorities to allow for Accelerated Approvals (AAs). AA is the "fast track" program where promising treatments are evaluated based on surrogate (short term) endpoints likely to predict clinical benefit. This allows treatments to get an early approval, subject to providing further evidence of efficacy, for example, on the primary (long term) endpoint. Despite this procedure being quite consolidated, a number of conditionally approved treatments do not obtain full approval (FA), mainly due to lack of correlation between surrogate and primary endpoint. This implies a need to improve the criteria for controlling the risk of AAs for noneffective treatments, while maximizing the chance of AAs for effective ones. We first propose a novel adaptive group sequential design that includes an early dual-criterion "Accelerated Approval" interim analysis, where efficacy on a surrogate endpoint is tested jointly with a predictive metric based on the primary endpoint. Secondarily, we explore how the predictive criterion may be strengthened by historical information borrowing, in particular using: (i) historical control data on the primary endpoint, and (ii) the estimated historical relationship between the surrogate and the primary endpoints. We propose various metrics to characterize the risk of correct and incorrect early AAs and demonstrate how the proposed design allows explicit control of these risks, with particular attention to the family-wise error rate (FWER). The methodology is then evaluated through a simulation study motivated by a Phase-III trial in metastatic colorectal cancer (mCRC).
{"title":"Dual-Criterion Approach Incorporating Historical Information to Seek Accelerated Approval With Application in Time-to-Event Group Sequential Trials.","authors":"Marco Ratta, Gaëlle Saint-Hilary, Valentine Barboux, Mauro Gasparini, Donia Skanji, Pavel Mozgunov","doi":"10.1002/sim.70361","DOIUrl":"10.1002/sim.70361","url":null,"abstract":"<p><p>The urgency of delivering novel, effective treatments against life-threatening diseases has brought various health authorities to allow for Accelerated Approvals (AAs). AA is the \"fast track\" program where promising treatments are evaluated based on surrogate (short term) endpoints likely to predict clinical benefit. This allows treatments to get an early approval, subject to providing further evidence of efficacy, for example, on the primary (long term) endpoint. Despite this procedure being quite consolidated, a number of conditionally approved treatments do not obtain full approval (FA), mainly due to lack of correlation between surrogate and primary endpoint. This implies a need to improve the criteria for controlling the risk of AAs for noneffective treatments, while maximizing the chance of AAs for effective ones. We first propose a novel adaptive group sequential design that includes an early dual-criterion \"Accelerated Approval\" interim analysis, where efficacy on a surrogate endpoint is tested jointly with a predictive metric based on the primary endpoint. Secondarily, we explore how the predictive criterion may be strengthened by historical information borrowing, in particular using: (i) historical control data on the primary endpoint, and (ii) the estimated historical relationship between the surrogate and the primary endpoints. We propose various metrics to characterize the risk of correct and incorrect early AAs and demonstrate how the proposed design allows explicit control of these risks, with particular attention to the family-wise error rate (FWER). The methodology is then evaluated through a simulation study motivated by a Phase-III trial in metastatic colorectal cancer (mCRC).</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70361"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective monitoring of event rates is essential for maintaining statistical power and study integrity in clinical trials, particularly when the primary endpoint involves time-to-event outcomes. We propose Sequential Event Rate Monitoring (SERM), a new and innovative approach for continuous monitoring of event rates. SERM leverages the Sequential Probability Ratio Test (SPRT) with improved boundaries derived from the nonlinear renewal theorem by Kim and Woodroofe (2003). This method represents the first practical implementation of their theoretical work in this area. SERM offers several tangible benefits, including ease of implementation, efficient use of data, and broad applicability to trials. Decision boundaries can be directly obtained from simple formula. A detailed illustration of the method using real-world data from a large Phase III clinical trial demonstrates its potential for rapid assessment. SERM operates on blinded data so it can be used in tandem with a broad range of study designs while preserving study integrity. Although slow patient accrual lengthens the time needed to reach a conclusion, it does not significantly affect type I or type II errors associated with the decision. This new method provides a robust tool for enhancing trial monitoring, enabling timely and informed decision-making in diverse clinical settings.
{"title":"Sequential Event Rate Monitoring.","authors":"Dong-Yun Kim, Sung-Min Han","doi":"10.1002/sim.70359","DOIUrl":"10.1002/sim.70359","url":null,"abstract":"<p><p>Effective monitoring of event rates is essential for maintaining statistical power and study integrity in clinical trials, particularly when the primary endpoint involves time-to-event outcomes. We propose Sequential Event Rate Monitoring (SERM), a new and innovative approach for continuous monitoring of event rates. SERM leverages the Sequential Probability Ratio Test (SPRT) with improved boundaries derived from the nonlinear renewal theorem by Kim and Woodroofe (2003). This method represents the first practical implementation of their theoretical work in this area. SERM offers several tangible benefits, including ease of implementation, efficient use of data, and broad applicability to trials. Decision boundaries can be directly obtained from simple formula. A detailed illustration of the method using real-world data from a large Phase III clinical trial demonstrates its potential for rapid assessment. SERM operates on blinded data so it can be used in tandem with a broad range of study designs while preserving study integrity. Although slow patient accrual lengthens the time needed to reach a conclusion, it does not significantly affect type I or type II errors associated with the decision. This new method provides a robust tool for enhancing trial monitoring, enabling timely and informed decision-making in diverse clinical settings.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70359"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Derived variables are variables that are constructed from one or more source variables through established mathematical operations or algorithms. For example, body mass index (BMI) is a derived variable constructed from two source variables: weight and height. When using a derived variable as the outcome in a statistical model, complications arise when some of the source variables have missing values. In this paper, we propose how one can define a single fully integrated Bayesian model to simultaneously impute missing values and sample from the posterior. We compare our proposed method with alternative approaches that rely on multiple imputation (MI), with examples including an analysis to estimate the risk of microcephaly (a derived variable based on sex, gestational age, and head circumference at birth) in newborns exposed to the ZIKA virus.
{"title":"A Fully-Integrated Bayesian Approach for the Imputation and Analysis of Derived Outcome Variables With Missingness.","authors":"Harlan Campbell, Tim P Morris, Paul Gustafson","doi":"10.1002/sim.70383","DOIUrl":"10.1002/sim.70383","url":null,"abstract":"<p><p>Derived variables are variables that are constructed from one or more source variables through established mathematical operations or algorithms. For example, body mass index (BMI) is a derived variable constructed from two source variables: weight and height. When using a derived variable as the outcome in a statistical model, complications arise when some of the source variables have missing values. In this paper, we propose how one can define a single fully integrated Bayesian model to simultaneously impute missing values and sample from the posterior. We compare our proposed method with alternative approaches that rely on multiple imputation (MI), with examples including an analysis to estimate the risk of microcephaly (a derived variable based on sex, gestational age, and head circumference at birth) in newborns exposed to the ZIKA virus.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70383"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reduce drug development costs and time in therapeutic areas with high unmet medical needs, sponsors are often motivated to accelerate the drug development process, specifically by skipping the Phase 2 trial and starting the Phase 3 trial directly after the Phase 1 trial. To address this need, a novel design called 2-in-1 adaptive design has been proposed recently. This design allows a trial to maintain a small trial or to expand to a large trial adaptively based on decisions made based on the interim analysis. Although several statistical methods have been proposed for the 2-in-1 adaptive design, they specifically emphasize clinical trials with continuous or time-to-event endpoints assuming the normal approximation of the test statistic. Methods for the 2-in-1 adaptive design with binary endpoints are notably lacking. For binary endpoints, some statistical tests do not rely on the normal approximation. Therefore, it is not clear whether the operating characteristics obtained by existing 2-in-1 adaptive design methods in the context of continuous or time-to-event endpoints can be generalized to cases with binary endpoints. For this study, we propose formulas to evaluate the type I error rate and power for the 2-in-1 adaptive design for binary endpoints. Our proposed formulas can evaluate the exact type I error rate and power without using Monte Carlo simulations. Moreover, the proposed formulas are useful for any statistical test of binary endpoints. Numerical investigations under different scenarios demonstrated that the operating characteristics for the 2-in-1 adaptive design with binary endpoints are similar to those with continuous or time-to-event endpoints. We present the application of our proposed formulas to a clinical trial in patients with pyruvate kinase deficiency.
{"title":"Analytical Evaluation of the 2-in-1 Adaptive Design for Binary Endpoints.","authors":"Gosuke Homma, Takuma Yoshida","doi":"10.1002/sim.70349","DOIUrl":"https://doi.org/10.1002/sim.70349","url":null,"abstract":"<p><p>To reduce drug development costs and time in therapeutic areas with high unmet medical needs, sponsors are often motivated to accelerate the drug development process, specifically by skipping the Phase 2 trial and starting the Phase 3 trial directly after the Phase 1 trial. To address this need, a novel design called 2-in-1 adaptive design has been proposed recently. This design allows a trial to maintain a small trial or to expand to a large trial adaptively based on decisions made based on the interim analysis. Although several statistical methods have been proposed for the 2-in-1 adaptive design, they specifically emphasize clinical trials with continuous or time-to-event endpoints assuming the normal approximation of the test statistic. Methods for the 2-in-1 adaptive design with binary endpoints are notably lacking. For binary endpoints, some statistical tests do not rely on the normal approximation. Therefore, it is not clear whether the operating characteristics obtained by existing 2-in-1 adaptive design methods in the context of continuous or time-to-event endpoints can be generalized to cases with binary endpoints. For this study, we propose formulas to evaluate the type I error rate and power for the 2-in-1 adaptive design for binary endpoints. Our proposed formulas can evaluate the exact type I error rate and power without using Monte Carlo simulations. Moreover, the proposed formulas are useful for any statistical test of binary endpoints. Numerical investigations under different scenarios demonstrated that the operating characteristics for the 2-in-1 adaptive design with binary endpoints are similar to those with continuous or time-to-event endpoints. We present the application of our proposed formulas to a clinical trial in patients with pyruvate kinase deficiency.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70349"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Chen, Sophie Sun, Konstantinos Sechidis, Cong Zhang, Torsten Hothorn, Björn Bornkamp
This paper reviews and compares methods to assess treatment effect heterogeneity in the context of parametric regression models. These methods include the standard likelihood ratio tests, bootstrap likelihood ratio tests, and Goeman's global test, motivated by testing whether the random effect variance is zero. We place particular emphasis on tests based on the score-residual of the treatment effect and explore different variants of tests in this class. All approaches are compared in a simulation study, and the approach based on residual scores is illustrated in a clinical trial with a time-to-event outcome comparing treatment vs. placebo. Our findings demonstrate that score-residual-based methods provide practical, flexible, and reliable tools for exploring treatment effect heterogeneity and treatment effect modifiers, and can provide useful guidance for decision-making around treatment effect heterogeneity.
{"title":"Comparing Methods to Assess Treatment Effect Heterogeneity in General Parametric Regression Models.","authors":"Yao Chen, Sophie Sun, Konstantinos Sechidis, Cong Zhang, Torsten Hothorn, Björn Bornkamp","doi":"10.1002/sim.70381","DOIUrl":"10.1002/sim.70381","url":null,"abstract":"<p><p>This paper reviews and compares methods to assess treatment effect heterogeneity in the context of parametric regression models. These methods include the standard likelihood ratio tests, bootstrap likelihood ratio tests, and Goeman's global test, motivated by testing whether the random effect variance is zero. We place particular emphasis on tests based on the score-residual of the treatment effect and explore different variants of tests in this class. All approaches are compared in a simulation study, and the approach based on residual scores is illustrated in a clinical trial with a time-to-event outcome comparing treatment vs. placebo. Our findings demonstrate that score-residual-based methods provide practical, flexible, and reliable tools for exploring treatment effect heterogeneity and treatment effect modifiers, and can provide useful guidance for decision-making around treatment effect heterogeneity.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70381"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Each patient is simultaneously given several binary tests for a disease. The tests are partitioned into disjoint groups, assumed to be conditionally independent between groups, but allowed to have arbitrary dependence within a group. The groups are intended to capture similar biological features of the tests. A Dirichlet-multinomial model is employed with a Gibbs Sampler to estimate the sensitivity and specificity of the tests. The model is exemplified by data on four tests for Chlamydia, both with complete data and with a random 10% of the data treated as missing.
{"title":"A Dirichlet-Multinomial Gibbs Algorithm for Assessing the Accuracy of Binary Tests in the Absence of a Gold Standard.","authors":"Joseph B Kadane","doi":"10.1002/sim.70372","DOIUrl":"10.1002/sim.70372","url":null,"abstract":"<p><p>Each patient is simultaneously given several binary tests for a disease. The tests are partitioned into disjoint groups, assumed to be conditionally independent between groups, but allowed to have arbitrary dependence within a group. The groups are intended to capture similar biological features of the tests. A Dirichlet-multinomial model is employed with a Gibbs Sampler to estimate the sensitivity and specificity of the tests. The model is exemplified by data on four tests for Chlamydia, both with complete data and with a random 10% of the data treated as missing.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70372"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides clear and practical guidance on the specification of imputation models when multiple imputation is used in conjunction with doubly robust estimation methods for causal inference. Through theoretical arguments and targeted simulations, we demonstrate that if a confounder has missing data, the corresponding imputation model must include all variables appearing in either the propensity score model or the outcome model, in addition to both the exposure and outcome, and that these variables must appear in the same functional form as in the final analysis. Violating these conditions can lead to biased treatment effect estimates, even when both components of the doubly robust estimator are correctly specified. We present a mathematical framework for doubly robust estimation combined with multiple imputation, establish the theoretical requirements for proper imputation in this setting, and demonstrate the consequences of misspecification through simulation. Based on these findings, we offer concrete recommendations to ensure valid inference when using multiple imputation with doubly robust methods in applied causal analyses.
{"title":"The Role of Congeniality in Multiple Imputation for Doubly Robust Causal Estimation.","authors":"Lucy D'Agostino McGowan","doi":"10.1002/sim.70363","DOIUrl":"10.1002/sim.70363","url":null,"abstract":"<p><p>This paper provides clear and practical guidance on the specification of imputation models when multiple imputation is used in conjunction with doubly robust estimation methods for causal inference. Through theoretical arguments and targeted simulations, we demonstrate that if a confounder has missing data, the corresponding imputation model must include all variables appearing in either the propensity score model or the outcome model, in addition to both the exposure and outcome, and that these variables must appear in the same functional form as in the final analysis. Violating these conditions can lead to biased treatment effect estimates, even when both components of the doubly robust estimator are correctly specified. We present a mathematical framework for doubly robust estimation combined with multiple imputation, establish the theoretical requirements for proper imputation in this setting, and demonstrate the consequences of misspecification through simulation. Based on these findings, we offer concrete recommendations to ensure valid inference when using multiple imputation with doubly robust methods in applied causal analyses.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70363"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828482/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning has excelled in the field of statistical learning. In the field of survival analysis, some studies have combined deep learning methods with partially linear structures to propose deep partially linear structures. We extend it to the field of competing risks and propose the deep partially linear subdistribution hazard model (DPLSHM). To evaluate the predictive performance of the model, we further develop a time-dependent AUC method specifically tailored for competing risks data and provide an estimator for AUC. Theoretical results for the proposed model demonstrate the asymptotic normality of the parameter component at a rate of and provide the convergence rate of the nonparametric component, which achieves the minimal limit convergence rate (multiplicative logarithmic factors). The theory of consistency and rate of convergence of AUC-related estimates is also developed, while we prove that the regression component of DPLSHM maximizes theoretical AUC asymptotically. Subsequently, the paper validates the excellent performance of DPLSHM in estimation and prediction through numerical simulations and real-world datasets.
{"title":"A DNN-Based Weighted Partial Likelihood for Partially Linear Subdistribution Hazard Model.","authors":"Nengjie Zhu, Zhangsheng Yu","doi":"10.1002/sim.70397","DOIUrl":"10.1002/sim.70397","url":null,"abstract":"<p><p>Deep learning has excelled in the field of statistical learning. In the field of survival analysis, some studies have combined deep learning methods with partially linear structures to propose deep partially linear structures. We extend it to the field of competing risks and propose the deep partially linear subdistribution hazard model (DPLSHM). To evaluate the predictive performance of the model, we further develop a time-dependent AUC method specifically tailored for competing risks data and provide an estimator for AUC. Theoretical results for the proposed model demonstrate the asymptotic normality of the parameter component at a rate of <math> <semantics> <mrow> <msqrt><mrow><mi>n</mi></mrow> </msqrt> </mrow> <annotation>$$ sqrt{n} $$</annotation></semantics> </math> and provide the convergence rate of the nonparametric component, which achieves the minimal limit convergence rate (multiplicative logarithmic factors). The theory of consistency and rate of convergence of AUC-related estimates is also developed, while we prove that the regression component of DPLSHM maximizes theoretical AUC asymptotically. Subsequently, the paper validates the excellent performance of DPLSHM in estimation and prediction through numerical simulations and real-world datasets.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 1-2","pages":"e70397"},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146019632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}