Lina M Montoya, Elvin H Geng, Michael Valancius, Michael R Kosorok, Maya L Petersen
We propose a novel causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, we consider a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment. We define the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. We show how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. We present simulation studies that evaluate the performance of this estimator under various finite-sample scenarios. Throughout, we use the "Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care" trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who had an increase in benefit from them initially.
{"title":"Effects Among the Affected.","authors":"Lina M Montoya, Elvin H Geng, Michael Valancius, Michael R Kosorok, Maya L Petersen","doi":"10.1002/sim.70353","DOIUrl":"10.1002/sim.70353","url":null,"abstract":"<p><p>We propose a novel causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, we consider a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment. We define the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. We show how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. We present simulation studies that evaluate the performance of this estimator under various finite-sample scenarios. Throughout, we use the \"Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care\" trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who had an increase in benefit from them initially.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70353"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Covariate measurement error is an important problem in survival analysis, which has been well studied under the Cox proportional hazards model. However, measurement error effects have been rarely addressed under the Aalen's additive hazards model, and there is a lack of methods to correct for error effects. In recent years, the Aalen's additive hazards model has been increasingly used in causal mediation analysis. Although the longitudinal mediator is frequently measured with uncertainty, the issue of measurement error in the mediator has received little attention. In this article, we study the general problem of covariate measurement error under the Aalen's additive hazards model and propose a measurement error correction strategy. We then extend the proposed method to causal mediation analysis in the survival setting with an error-prone longitudinal mediator. Corrected estimation of the direct and indirect effects is obtained. The performance of the proposed method is assessed in numerical studies.
{"title":"Survival Analysis Under the Aalen's Additive Hazards Model With Covariate Measurement Error: Application to Causal Mediation Analysis.","authors":"Xialing Wen, Liangchen Qin, Hui Wu, Ying Yan","doi":"10.1002/sim.70346","DOIUrl":"https://doi.org/10.1002/sim.70346","url":null,"abstract":"<p><p>Covariate measurement error is an important problem in survival analysis, which has been well studied under the Cox proportional hazards model. However, measurement error effects have been rarely addressed under the Aalen's additive hazards model, and there is a lack of methods to correct for error effects. In recent years, the Aalen's additive hazards model has been increasingly used in causal mediation analysis. Although the longitudinal mediator is frequently measured with uncertainty, the issue of measurement error in the mediator has received little attention. In this article, we study the general problem of covariate measurement error under the Aalen's additive hazards model and propose a measurement error correction strategy. We then extend the proposed method to causal mediation analysis in the survival setting with an error-prone longitudinal mediator. Corrected estimation of the direct and indirect effects is obtained. The performance of the proposed method is assessed in numerical studies.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70346"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145701538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Precision medicine relies on accurate and interpretable predictive models to identify patient subgroups and biomarkers that can guide individualized treatment strategies. While extreme gradient boosting (XGBoost) often achieves state-of-the-art predictive performance, its complexity can impede understanding of how input variables influence outcomes. Building upon existing XGBoost frameworks for estimating individualized treatment rule (ITR), we introduce a global permutation test within this framework to assess treatment effect heterogeneity. Additionally, we incorporate two model-agnostic explanation techniques, local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP), to enhance interpretability at both global and individual levels. Through simulations and analyses of real-world clinical trial datasets, we illustrate that our permutation-based pipeline can detect empirical signals of treatment effect heterogeneity, while LIME and SHAP offer exploratory insights into feature contributions and ITR.
{"title":"Explaining Individualized Treatment Rules: Integrating LIME and SHAP With Xgboost in Precision Medicine.","authors":"Zihuan Liu, Xin Huang","doi":"10.1002/sim.70322","DOIUrl":"https://doi.org/10.1002/sim.70322","url":null,"abstract":"<p><p>Precision medicine relies on accurate and interpretable predictive models to identify patient subgroups and biomarkers that can guide individualized treatment strategies. While extreme gradient boosting (XGBoost) often achieves state-of-the-art predictive performance, its complexity can impede understanding of how input variables influence outcomes. Building upon existing XGBoost frameworks for estimating individualized treatment rule (ITR), we introduce a global permutation test within this framework to assess treatment effect heterogeneity. Additionally, we incorporate two model-agnostic explanation techniques, local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP), to enhance interpretability at both global and individual levels. Through simulations and analyses of real-world clinical trial datasets, we illustrate that our permutation-based pipeline can detect empirical signals of treatment effect heterogeneity, while LIME and SHAP offer exploratory insights into feature contributions and ITR.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70322"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avi Kenny, Lars van der Laan, Peter Gilbert, Marco Carone
In vaccine research, it is important to identify biomarkers that can reliably predict vaccine efficacy against a clinical endpoint. Such biomarkers are known as immune correlates of protection (CoP) and can serve as surrogate endpoints in vaccine efficacy trials to accelerate the approval process. CoPs must be rigorously validated, and one method of doing so is through the controlled risk (CR) curve, a function that represents the causal effect of the biomarker on population-level risk of experiencing the endpoint of interest by a certain time post-vaccination. The CR curve can be estimated by leveraging a Cox proportional hazards model, but researchers currently rely on the bootstrap for inference, which can be computationally demanding. In this article, we analytically derive the asymptotic variance of this estimator, providing an analytic approach for constructing both pointwise and uniform confidence bands. We evaluate the finite sample performance of these methods in a simulation study and illustrate their use on data from the Coronavirus Efficacy (COVE) placebo-controlled phase 3 trial (NCT04470427) of the mRNA-1273 COVID-19 vaccine.
{"title":"Inference on Controlled Effects for Assessing Immune Correlates of Protection Based on a Cox Model.","authors":"Avi Kenny, Lars van der Laan, Peter Gilbert, Marco Carone","doi":"10.1002/sim.70347","DOIUrl":"10.1002/sim.70347","url":null,"abstract":"<p><p>In vaccine research, it is important to identify biomarkers that can reliably predict vaccine efficacy against a clinical endpoint. Such biomarkers are known as immune correlates of protection (CoP) and can serve as surrogate endpoints in vaccine efficacy trials to accelerate the approval process. CoPs must be rigorously validated, and one method of doing so is through the controlled risk (CR) curve, a function that represents the causal effect of the biomarker on population-level risk of experiencing the endpoint of interest by a certain time post-vaccination. The CR curve can be estimated by leveraging a Cox proportional hazards model, but researchers currently rely on the bootstrap for inference, which can be computationally demanding. In this article, we analytically derive the asymptotic variance of this estimator, providing an analytic approach for constructing both pointwise and uniform confidence bands. We evaluate the finite sample performance of these methods in a simulation study and illustrate their use on data from the Coronavirus Efficacy (COVE) placebo-controlled phase 3 trial (NCT04470427) of the mRNA-1273 COVID-19 vaccine.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70347"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145715815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meta-analysis consists of a wide range of methods for summarizing existing research, often by aggregating summary statistics. The dominant methods are the fixed effect and the random effects models, which assume that all studies included in a meta-analysis are similar. In many scenarios, the available studies differ in important ways, for example, in terms of research design and sample population. To handle this heterogeneity, more advanced methods are required. In this article, we review some of these methods that have been proposed in the past decades: hierarchical models, bias adjustment and quality weighting methods, Bayesian methods, and decision-centered meta-analysis. We aim to describe the theoretical rationale behind the methods and to give examples of applications. Each method has advantages and limitations, and we consider ways of combining methods.
{"title":"A Review of Methods for Research Synthesis.","authors":"Pär Villner, Matteo Bottai","doi":"10.1002/sim.70314","DOIUrl":"10.1002/sim.70314","url":null,"abstract":"<p><p>Meta-analysis consists of a wide range of methods for summarizing existing research, often by aggregating summary statistics. The dominant methods are the fixed effect and the random effects models, which assume that all studies included in a meta-analysis are similar. In many scenarios, the available studies differ in important ways, for example, in terms of research design and sample population. To handle this heterogeneity, more advanced methods are required. In this article, we review some of these methods that have been proposed in the past decades: hierarchical models, bias adjustment and quality weighting methods, Bayesian methods, and decision-centered meta-analysis. We aim to describe the theoretical rationale behind the methods and to give examples of applications. Each method has advantages and limitations, and we consider ways of combining methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70314"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145669217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Outcome Reporting Bias (ORB) poses significant threats to the validity of meta-analytic findings. It occurs when researchers selectively report outcomes based on the significance or direction of results, potentially leading to distorted treatment effect estimates. Despite its critical implications, ORB remains an under-recognized issue, with few comprehensive adjustment methods available. The goal of this research is to investigate ORB-adjustment techniques through a selection model lens, thereby extending some of the existing methodological approaches available in the literature. To gain a better insight into the effects of ORB in meta-analysis of clinical trials, specifically in the presence of heterogeneity, and to assess the effectiveness of ORB-adjustment techniques, we apply the methodology to real clinical data affected by ORB and conduct a simulation study focusing on treatment effect estimation with a secondary interest in heterogeneity quantification.
{"title":"Addressing Outcome Reporting Bias in Meta-Analysis: A Selection Model Perspective.","authors":"Alessandra Gaia Saracini, Leonhard Held","doi":"10.1002/sim.70238","DOIUrl":"10.1002/sim.70238","url":null,"abstract":"<p><p>Outcome Reporting Bias (ORB) poses significant threats to the validity of meta-analytic findings. It occurs when researchers selectively report outcomes based on the significance or direction of results, potentially leading to distorted treatment effect estimates. Despite its critical implications, ORB remains an under-recognized issue, with few comprehensive adjustment methods available. The goal of this research is to investigate ORB-adjustment techniques through a selection model lens, thereby extending some of the existing methodological approaches available in the literature. To gain a better insight into the effects of ORB in meta-analysis of clinical trials, specifically in the presence of heterogeneity, and to assess the effectiveness of ORB-adjustment techniques, we apply the methodology to real clinical data affected by ORB and conduct a simulation study focusing on treatment effect estimation with a secondary interest in heterogeneity quantification.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70238"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The graph-based approach to multiple testing is an intuitive method that enables a study team to represent clearly, through a directed graph, its priorities for hierarchical testing of multiple hypotheses, and for propagating the available type-1 error from rejected or dropped hypotheses to hypotheses yet to be tested. Although originally developed for single-stage nonadaptive designs, we show how it may be extended to two-stage designs that permit early identification of efficacious treatments, adaptive sample size re-estimation, dropping of hypotheses, and changes in the hierarchical testing strategy at the end of stage one. Two approaches are available for preserving the familywise error rate in the presence of these adaptive changes, the value combination method, and the conditional error rate method. In this investigation, we will present the statistical methodology underlying each approach and will compare the operating characteristics of the two methods in a large simulation experiment.
基于图的多重测试方法是一种直观的方法,它使研究团队能够通过有向图清楚地表示多个假设的分层测试的优先级,并将可用的1型错误从被拒绝或丢弃的假设传播到尚未测试的假设。虽然最初是为单阶段非适应性设计开发的,但我们展示了如何将其扩展到两阶段设计,从而允许早期识别有效治疗,自适应样本量重新估计,放弃假设,并在第一阶段结束时改变分层测试策略。有两种方法可用于在存在这些自适应变化的情况下保持家族错误率,即p $$ p $$值组合法和条件错误率法。在这项调查中,我们将介绍每种方法的统计方法,并将在大型模拟实验中比较两种方法的操作特性。
{"title":"Graph Based, Adaptive, Multiarm, Multiple Endpoint, Two-Stage Designs.","authors":"Cyrus Mehta, Ajoy Mukhopadhyay, Martin Posch","doi":"10.1002/sim.70237","DOIUrl":"10.1002/sim.70237","url":null,"abstract":"<p><p>The graph-based approach to multiple testing is an intuitive method that enables a study team to represent clearly, through a directed graph, its priorities for hierarchical testing of multiple hypotheses, and for propagating the available type-1 error from rejected or dropped hypotheses to hypotheses yet to be tested. Although originally developed for single-stage nonadaptive designs, we show how it may be extended to two-stage designs that permit early identification of efficacious treatments, adaptive sample size re-estimation, dropping of hypotheses, and changes in the hierarchical testing strategy at the end of stage one. Two approaches are available for preserving the familywise error rate in the presence of these adaptive changes, the <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value combination method, and the conditional error rate method. In this investigation, we will present the statistical methodology underlying each approach and will compare the operating characteristics of the two methods in a large simulation experiment.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70237"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julie-Alexia Dias, Eunchan Bae, Theodore Huang, Jinbo Chen, Stephen B Gruber, Gregory Idos, Giovanni Parmigiani, Timothy R Rebbeck, Danielle Braun
Many pathogenic sequence variants (PSVs) have been associated with increased risk of cancers. Mendelian risk prediction models use Mendelian laws of inheritance, as well as specified PSV frequency and penetrance (age-specific probability of developing cancer given genotype), to predict the probability of having a PSV based on family history. Most existing models assume that the penetrance is the same for all PSVs in a certain gene. However, for some genes (e.g., BRCA1/2), cancer risk has been shown to vary by PSV. We propose an extension of Mendelian risk prediction models that relaxes the assumption of homogeneous gene-level risk by incorporating PSV-specific penetrances and illustrate this extension on an existing Mendelian risk prediction model, Fam3PRO. We illustrate our proposed Fam3PRO-variant model by incorporating variant-specific BRCA1/2 PSVs through region classifications. Based on prior literature, we defined three cancer-specific risk regions: The breast cancer clustering region (BCCR), the ovarian cancer clustering region (OCCR), and the "other" region. We conducted simulations to evaluate the performance of the proposed illustrative Fam3PRO-variant model compared to the existing Fam3PRO model. Simulation results showed that the Fam3PRO-variant model was well calibrated to predict region-specific BRCA1/2 carrier status with high discrimination and accuracy. Importantly, our simulations also highlighted the impact of underreporting in family history data on model performance: While underreporting slightly reduced absolute calibration, the Fam3PRO-variant model remained robust in discrimination and provided more accurate region-specific PSV risk predictions than gene-level models. We further evaluated Fam3PRO-variant on two cohorts: 1897 families from the Cancer Genetics Network (CGN) and 25 671 families from the Clinical Cancer Genomics Community Research Network (CCGCRN). Results showed that our proposed model provides region-specific PSV carrier probabilities with high accuracy, while the calibration, discrimination, and accuracy of gene-specific PSV carrier probabilities were comparable to the existing gene-specific model. Moreover, we assessed the clinical utility of Fam3PRO-variant by evaluating positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity at clinically relevant thresholds (2.5%, 5%, and 10%), as recommended by NCCN guidelines. Fam3PRO-variant performed comparably to Fam3PRO at the gene level across all metrics, with notably high specificity and NPV at the region-specific level. These results suggest that, even in the presence of underreporting, Mendelian risk prediction models can be effectively extended to incorporate variant-specific penetrances, providing more precise region-specific PSV carrier probabilities and improving cancer prevention and risk prediction.
{"title":"Variant-Specific Mendelian Risk Prediction Model.","authors":"Julie-Alexia Dias, Eunchan Bae, Theodore Huang, Jinbo Chen, Stephen B Gruber, Gregory Idos, Giovanni Parmigiani, Timothy R Rebbeck, Danielle Braun","doi":"10.1002/sim.70342","DOIUrl":"10.1002/sim.70342","url":null,"abstract":"<p><p>Many pathogenic sequence variants (PSVs) have been associated with increased risk of cancers. Mendelian risk prediction models use Mendelian laws of inheritance, as well as specified PSV frequency and penetrance (age-specific probability of developing cancer given genotype), to predict the probability of having a PSV based on family history. Most existing models assume that the penetrance is the same for all PSVs in a certain gene. However, for some genes (e.g., BRCA1/2), cancer risk has been shown to vary by PSV. We propose an extension of Mendelian risk prediction models that relaxes the assumption of homogeneous gene-level risk by incorporating PSV-specific penetrances and illustrate this extension on an existing Mendelian risk prediction model, Fam3PRO. We illustrate our proposed Fam3PRO-variant model by incorporating variant-specific BRCA1/2 PSVs through region classifications. Based on prior literature, we defined three cancer-specific risk regions: The breast cancer clustering region (BCCR), the ovarian cancer clustering region (OCCR), and the \"other\" region. We conducted simulations to evaluate the performance of the proposed illustrative Fam3PRO-variant model compared to the existing Fam3PRO model. Simulation results showed that the Fam3PRO-variant model was well calibrated to predict region-specific BRCA1/2 carrier status with high discrimination and accuracy. Importantly, our simulations also highlighted the impact of underreporting in family history data on model performance: While underreporting slightly reduced absolute calibration, the Fam3PRO-variant model remained robust in discrimination and provided more accurate region-specific PSV risk predictions than gene-level models. We further evaluated Fam3PRO-variant on two cohorts: 1897 families from the Cancer Genetics Network (CGN) and 25 671 families from the Clinical Cancer Genomics Community Research Network (CCGCRN). Results showed that our proposed model provides region-specific PSV carrier probabilities with high accuracy, while the calibration, discrimination, and accuracy of gene-specific PSV carrier probabilities were comparable to the existing gene-specific model. Moreover, we assessed the clinical utility of Fam3PRO-variant by evaluating positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity at clinically relevant thresholds (2.5%, 5%, and 10%), as recommended by NCCN guidelines. Fam3PRO-variant performed comparably to Fam3PRO at the gene level across all metrics, with notably high specificity and NPV at the region-specific level. These results suggest that, even in the presence of underreporting, Mendelian risk prediction models can be effectively extended to incorporate variant-specific penetrances, providing more precise region-specific PSV carrier probabilities and improving cancer prevention and risk prediction.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70342"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145715864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this big data era, we can readily access extensive clinical data from large observational studies or electronic health records (EHR). Data accuracy can vary according to the measurement method. For example, clinical variables extracted by automated computer algorithms or obtained from participant self-reported medical history can be error-prone. Precise data, such as those obtained from a chart review or a gold standard diagnostic test, may only be available on a subset of individuals due to cost or participant burden. We propose a method to augment a regression analysis of a gold standard time-to-event outcome with available error-prone disease diagnoses for the setting where the gold standard is observed on a subset. The proposed model addresses left-truncation and interval-censoring in time-to-event outcomes while leveraging information from the self-reported disease diagnosis in a joint likelihood for the gold standard and error-prone outcomes. The proposed model is applied to the Hispanic Community Health Study/Study of Latinos data to quantify risk factors associated with diabetes onset.
{"title":"An Augmented Likelihood Approach Incorporating Error-Prone Auxiliary Data Into a Survival Analysis.","authors":"Noorie Hyun, Lillian Boe, Pamela A Shaw","doi":"10.1002/sim.70321","DOIUrl":"10.1002/sim.70321","url":null,"abstract":"<p><p>In this big data era, we can readily access extensive clinical data from large observational studies or electronic health records (EHR). Data accuracy can vary according to the measurement method. For example, clinical variables extracted by automated computer algorithms or obtained from participant self-reported medical history can be error-prone. Precise data, such as those obtained from a chart review or a gold standard diagnostic test, may only be available on a subset of individuals due to cost or participant burden. We propose a method to augment a regression analysis of a gold standard time-to-event outcome with available error-prone disease diagnoses for the setting where the gold standard is observed on a subset. The proposed model addresses left-truncation and interval-censoring in time-to-event outcomes while leveraging information from the self-reported disease diagnosis in a joint likelihood for the gold standard and error-prone outcomes. The proposed model is applied to the Hispanic Community Health Study/Study of Latinos data to quantify risk factors associated with diabetes onset.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70321"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145655592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amin Yarahmadi, Lori E Dodd, Peter Horby, Thomas Jaki, Nigel Stallard
Clinical trials conducted during the COVID-19 pandemic demonstrated the value of adaptive design methods in emerging disease settings, when there can be considerable uncertainty around disease natural history, anticipated endpoint effect sizes and population size. In such settings, there may also be uncertainty regarding the most appropriate primary endpoint. This might lead to an externally-driven decision to change the primary endpoint during the course of an adaptive trial. If information on the new primary endpoint is already being collected, initially as a secondary endpoint, the trial could continue with a new primary endpoint. In this case it is unclear how statistical inference on the final primary endpoint should be adjusted for interim analyses monitoring the initial primary endpoint so as to control the overall type I error rate as adjusting for monitoring as if this was based on the new endpoint could be conservative whereas failing to make any adjustment could lead to type I error rate inflation if the new and original endpoint are correlated. This paper shows how group-sequential methods can be modified to control the type I error rate for the analysis of the new primary endpoint irrespective of the true treatment effect on the initial primary endpoint. The method is illustrated using a simulated data example based on a clinical trial of remdesivir in COVID-19. Construction of critical values for the test of the new primary endpoint require a value for the correlation between this and the initial primary endpoint. We present simulation studies to demonstrate that the type I error rate is controlled when this value is estimated from the data on the two endpoints obtained from the trial.
{"title":"Group-Sequential Designs With an Externally-Driven Change of Primary Endpoint.","authors":"Amin Yarahmadi, Lori E Dodd, Peter Horby, Thomas Jaki, Nigel Stallard","doi":"10.1002/sim.70337","DOIUrl":"10.1002/sim.70337","url":null,"abstract":"<p><p>Clinical trials conducted during the COVID-19 pandemic demonstrated the value of adaptive design methods in emerging disease settings, when there can be considerable uncertainty around disease natural history, anticipated endpoint effect sizes and population size. In such settings, there may also be uncertainty regarding the most appropriate primary endpoint. This might lead to an externally-driven decision to change the primary endpoint during the course of an adaptive trial. If information on the new primary endpoint is already being collected, initially as a secondary endpoint, the trial could continue with a new primary endpoint. In this case it is unclear how statistical inference on the final primary endpoint should be adjusted for interim analyses monitoring the initial primary endpoint so as to control the overall type I error rate as adjusting for monitoring as if this was based on the new endpoint could be conservative whereas failing to make any adjustment could lead to type I error rate inflation if the new and original endpoint are correlated. This paper shows how group-sequential methods can be modified to control the type I error rate for the analysis of the new primary endpoint irrespective of the true treatment effect on the initial primary endpoint. The method is illustrated using a simulated data example based on a clinical trial of remdesivir in COVID-19. Construction of critical values for the test of the new primary endpoint require a value for the correlation between this and the initial primary endpoint. We present simulation studies to demonstrate that the type I error rate is controlled when this value is estimated from the data on the two endpoints obtained from the trial.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 28-30","pages":"e70337"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}