Hao Wang, Xinyuan Chen, Katherine R Courtright, Scott D Halpern, Michael O Harhay, Monica Taljaard, Fan Li
In stepped wedge cluster randomized trials (SW-CRTs), the intervention is rolled out to clusters over multiple periods. A standard approach for analyzing SW-CRTs utilizes the linear mixed model, where the treatment effect is only present after the treatment adoption, under the assumption of no anticipation. This assumption, however, may not always hold in practice because stakeholders, providers, or individuals who are aware of the treatment adoption timing (especially when blinding is challenging or infeasible) can inadvertently change their behaviors in anticipation of the forthcoming intervention. We provide an analytical framework to address the anticipation effect in SW-CRTs and study its impact. We derive expectations of the estimators based on a collection of linear mixed models and demonstrate that when the anticipation effect is ignored, these estimators give biased estimates of the treatment effect. We also provide updated sample size formulas that explicitly account for anticipation effects, exposure-time heterogeneity, or both in SW-CRTs and illustrate their impact on study power. Through simulation studies and empirical analyses, we compare the treatment effect estimators with and without adjusting for anticipation, and provide some practical considerations.
{"title":"On Anticipation Effect in Stepped Wedge Cluster Randomized Trials.","authors":"Hao Wang, Xinyuan Chen, Katherine R Courtright, Scott D Halpern, Michael O Harhay, Monica Taljaard, Fan Li","doi":"10.1002/sim.70380","DOIUrl":"https://doi.org/10.1002/sim.70380","url":null,"abstract":"<p><p>In stepped wedge cluster randomized trials (SW-CRTs), the intervention is rolled out to clusters over multiple periods. A standard approach for analyzing SW-CRTs utilizes the linear mixed model, where the treatment effect is only present after the treatment adoption, under the assumption of no anticipation. This assumption, however, may not always hold in practice because stakeholders, providers, or individuals who are aware of the treatment adoption timing (especially when blinding is challenging or infeasible) can inadvertently change their behaviors in anticipation of the forthcoming intervention. We provide an analytical framework to address the anticipation effect in SW-CRTs and study its impact. We derive expectations of the estimators based on a collection of linear mixed models and demonstrate that when the anticipation effect is ignored, these estimators give biased estimates of the treatment effect. We also provide updated sample size formulas that explicitly account for anticipation effects, exposure-time heterogeneity, or both in SW-CRTs and illustrate their impact on study power. Through simulation studies and empirical analyses, we compare the treatment effect estimators with and without adjusting for anticipation, and provide some practical considerations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70380"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146150718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase I clinical trials aim to identify the maximum tolerated dose (MTD), a task that becomes challenging in rare disease due to limited patient recruitment. Traditional dose-finding designs, which assign one dose per patient, require a sufficient sample size that may be infeasible for rare disease trials. To address these limitations, we propose the patient retreat in dose escalation (PRIDE) scheme, which integrates intra-patient dose escalation and considers intra-patient correlations by incorporating random effects into a Bayesian hierarchical framework. We further introduce PRIDE-FA (flexible allocation), an extension of PRIDE with a flexible allocation strategy. By allowing retreated patients to be assigned to any dose level based on trial needs, PRIDE-FA improves resource efficiency, leading to greater reductions in required sample size and trial duration. This paper incorporates random effects into established dose-finding designs, including the calibration-free odds (CFO) design, the Bayesian optimal interval (BOIN) design, and the continual reassessment method (CRM) to account for intra-patient correlations when each patient may receive multiple doses. Simulation studies demonstrate that PRIDE and PRIDE-FA significantly improve the accuracy of MTD selection, reduce required sample size, and shorten trial duration compared to existing dose-finding methods. Together, PRIDE and PRIDE-FA provide a robust and efficient framework for phase I clinical trials with rare diseases.
{"title":"Patient Retreat in Dose Escalation for Phase I Clinical Trials With Rare Diseases.","authors":"Jialu Fang, Guosheng Yin","doi":"10.1002/sim.70409","DOIUrl":"10.1002/sim.70409","url":null,"abstract":"<p><p>Phase I clinical trials aim to identify the maximum tolerated dose (MTD), a task that becomes challenging in rare disease due to limited patient recruitment. Traditional dose-finding designs, which assign one dose per patient, require a sufficient sample size that may be infeasible for rare disease trials. To address these limitations, we propose the patient retreat in dose escalation (PRIDE) scheme, which integrates intra-patient dose escalation and considers intra-patient correlations by incorporating random effects into a Bayesian hierarchical framework. We further introduce PRIDE-FA (flexible allocation), an extension of PRIDE with a flexible allocation strategy. By allowing retreated patients to be assigned to any dose level based on trial needs, PRIDE-FA improves resource efficiency, leading to greater reductions in required sample size and trial duration. This paper incorporates random effects into established dose-finding designs, including the calibration-free odds (CFO) design, the Bayesian optimal interval (BOIN) design, and the continual reassessment method (CRM) to account for intra-patient correlations when each patient may receive multiple doses. Simulation studies demonstrate that PRIDE and PRIDE-FA significantly improve the accuracy of MTD selection, reduce required sample size, and shorten trial duration compared to existing dose-finding methods. Together, PRIDE and PRIDE-FA provide a robust and efficient framework for phase I clinical trials with rare diseases.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70409"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12873649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A dynamic treatment regime is a sequence of decision rules that map available history information to a treatment option at each decision point. The optimal dynamic treatment regime seeks to make these decisions to maximize the expected outcome of interest. Most existing methods assume population homogeneity. In many complex applications, ignoring latent heterogeneous structures may compromise estimation, highlighting the necessity of exploring heterogeneous structures during the estimation of optimal treatment regimes. We propose heterogeneous Q-learning that facilitates the estimation of optimal dynamic treatment regimes using a concave pairwise fusion penalized approach. The proposed method employs an alternating direction method of multipliers algorithm to solve the concave pairwise fusion penalized least squares problem in each stage. Simulation studies demonstrate that our proposed method outperforms the standard Q-learning method, and it is further illustrated through a real data analysis from the China Rural Hypertension Control Project (CRHCP) study group.
{"title":"A Concave Pairwise Fusion Approach to Heterogeneous Q-Learning for Dynamic Treatment Regimes.","authors":"Jubo Sun, Wensheng Zhu, Guozhe Sun","doi":"10.1002/sim.70415","DOIUrl":"https://doi.org/10.1002/sim.70415","url":null,"abstract":"<p><p>A dynamic treatment regime is a sequence of decision rules that map available history information to a treatment option at each decision point. The optimal dynamic treatment regime seeks to make these decisions to maximize the expected outcome of interest. Most existing methods assume population homogeneity. In many complex applications, ignoring latent heterogeneous structures may compromise estimation, highlighting the necessity of exploring heterogeneous structures during the estimation of optimal treatment regimes. We propose heterogeneous Q-learning that facilitates the estimation of optimal dynamic treatment regimes using a concave pairwise fusion penalized approach. The proposed method employs an alternating direction method of multipliers algorithm to solve the concave pairwise fusion penalized least squares problem in each stage. Simulation studies demonstrate that our proposed method outperforms the standard Q-learning method, and it is further illustrated through a real data analysis from the China Rural Hypertension Control Project (CRHCP) study group.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70415"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Difference-in-differences is a popular method for observational health policy evaluation. It relies on a causal assumption that in the absence of intervention, treatment groups' outcomes would have evolved in parallel to those of comparison groups. Researchers frequently look for parallel trends in the pre-intervention period to bolster confidence in this assumption. The popular "parallel trends test" evaluates a null hypothesis of parallel trends and, failing to find evidence against the null, concludes that the assumption holds. This tightly controls the probability of falsely concluding that trends are not parallel, but may have low power to detect non-parallel trends. When used as a screening step, it can also introduce bias in treatment effect estimates. We propose a non-inferiority/equivalence approach that tightly controls the probability of missing large violations of parallel trends, measured on the scale of the treatment effect. Our framework nests several common use cases, including linear trend tests and event studies. We show that our approach may induce no or minimal bias when used as a screening step under commonly assumed error structures and, absent violations, can offer a higher-power alternative to testing treatment effects in more flexible models. We illustrate our ideas by reconsidering a study of the impact of the Affordable Care Act's dependent coverage provision.
{"title":"Nothing to See Here? A Non-Inferiority Approach to Parallel Trends.","authors":"Alyssa Bilinski, Laura A Hatfield","doi":"10.1002/sim.70296","DOIUrl":"https://doi.org/10.1002/sim.70296","url":null,"abstract":"<p><p>Difference-in-differences is a popular method for observational health policy evaluation. It relies on a causal assumption that in the absence of intervention, treatment groups' outcomes would have evolved in parallel to those of comparison groups. Researchers frequently look for parallel trends in the pre-intervention period to bolster confidence in this assumption. The popular \"parallel trends test\" evaluates a null hypothesis of parallel trends and, failing to find evidence against the null, concludes that the assumption holds. This tightly controls the probability of falsely concluding that trends are not parallel, but may have low power to detect non-parallel trends. When used as a screening step, it can also introduce bias in treatment effect estimates. We propose a non-inferiority/equivalence approach that tightly controls the probability of missing large violations of parallel trends, measured on the scale of the treatment effect. Our framework nests several common use cases, including linear trend tests and event studies. We show that our approach may induce no or minimal bias when used as a screening step under commonly assumed error structures and, absent violations, can offer a higher-power alternative to testing treatment effects in more flexible models. We illustrate our ideas by reconsidering a study of the impact of the Affordable Care Act's dependent coverage provision.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70296"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cluster randomized trials (CRTs), in which entire clusters of subjects are randomized to treatment arms, are widely used in pragmatic trials to evaluate interventions under real-world conditions. However, CRTs are particularly vulnerable to treatment non-adherence, especially when cluster-level preferences lead subjects in clusters to deviate from their assigned treatment. Such deviations can reduce power, introduce bias, and compromise generalizability if not properly addressed. This research is directly motivated by a planned multi-center trial in Kawasaki Disease patients with high risk for coronary artery abnormalities, in which institutional treatment preferences influence both willingness to participate and adhere. To address this issue, we propose a Bayesian hierarchical model under a Preference-Informed Cluster Randomized Design (PICRD). This model explicitly incorporates cluster-level treatment switching into the analysis rather than excluding non-willing or non-adherent clusters. We conduct a simulation study to evaluate the performance of the PICRD model across a range of treatment effect sizes and switching proportions. Results demonstrate that the PICRD model consistently outperforms per-protocol analyses by maintaining higher power for the main treatment effect, producing narrower 95% credible intervals, and yielding more stable bias and root mean square error in the presence of substantial non-adherence. By explicitly modeling preference within a Bayesian hierarchical framework, the PICRD approach provides a flexible and robust solution for CRTs conducted in pragmatic settings when willingness to accept randomization assignment or adherence to randomization is often unrealistic.
{"title":"Preference-Informed Cluster Randomized Design for Pragmatic Clinical Trials.","authors":"Yuwei Cheng, Adriana Tremoulet, Sonia Jain","doi":"10.1002/sim.70426","DOIUrl":"10.1002/sim.70426","url":null,"abstract":"<p><p>Cluster randomized trials (CRTs), in which entire clusters of subjects are randomized to treatment arms, are widely used in pragmatic trials to evaluate interventions under real-world conditions. However, CRTs are particularly vulnerable to treatment non-adherence, especially when cluster-level preferences lead subjects in clusters to deviate from their assigned treatment. Such deviations can reduce power, introduce bias, and compromise generalizability if not properly addressed. This research is directly motivated by a planned multi-center trial in Kawasaki Disease patients with high risk for coronary artery abnormalities, in which institutional treatment preferences influence both willingness to participate and adhere. To address this issue, we propose a Bayesian hierarchical model under a Preference-Informed Cluster Randomized Design (PICRD). This model explicitly incorporates cluster-level treatment switching into the analysis rather than excluding non-willing or non-adherent clusters. We conduct a simulation study to evaluate the performance of the PICRD model across a range of treatment effect sizes and switching proportions. Results demonstrate that the PICRD model consistently outperforms per-protocol analyses by maintaining higher power for the main treatment effect, producing narrower 95% credible intervals, and yielding more stable bias and root mean square error in the presence of substantial non-adherence. By explicitly modeling preference within a Bayesian hierarchical framework, the PICRD approach provides a flexible and robust solution for CRTs conducted in pragmatic settings when willingness to accept randomization assignment or adherence to randomization is often unrealistic.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70426"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have studied 21 435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews (CDSR). Of these trials, 7224 (34%) have a continuous (numerical) outcome and 14 211 (66%) have a binary outcome. We find that trials with a binary outcome have larger sample sizes on average, but also larger standard errors and fewer statistically significant results. We conclude that researchers tend to increase the sample size to compensate for the low information content of binary outcomes, but not sufficiently. In many cases, the binary outcome is the result of dichotomization of a continuous outcome, which is sometimes referred to as "responder analysis". In those cases, the loss of information is avoidable. Burdening more participants than necessary is wasteful, costly, and unethical. We provide a method to convert a sample size calculation for the comparison of two proportions into one for the comparison of the means of the underlying continuous outcomes. This demonstrates how much the sample size may be reduced if the outcome were not dichotomized. We also provide a method to calculate the loss of information after a dichotomization. We apply this method to all the trials from the CDSR with a binary outcome, and estimate that on average, only about 60% of the information is retained after dichotomization. We provide R code and a shiny app at: https://vanzwet.shinyapps.io/info_loss/ to do these calculations. We hope that quantifying the loss of information will discourage researchers from dichotomizing continuous outcomes. Instead, we recommend they "model continuously but interpret dichotomously". For example, they might present "percentage achieving clinically meaningful improvement" derived from a continuous analysis rather than by dichotomizing raw data.
{"title":"An Empirical Assessment of the Cost of Dichotomization of the Outcome of Clinical Trials.","authors":"Erik W van Zwet, Frank E Harrell, Stephen J Senn","doi":"10.1002/sim.70402","DOIUrl":"10.1002/sim.70402","url":null,"abstract":"<p><p>We have studied 21 435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews (CDSR). Of these trials, 7224 (34%) have a continuous (numerical) outcome and 14 211 (66%) have a binary outcome. We find that trials with a binary outcome have larger sample sizes on average, but also larger standard errors and fewer statistically significant results. We conclude that researchers tend to increase the sample size to compensate for the low information content of binary outcomes, but not sufficiently. In many cases, the binary outcome is the result of dichotomization of a continuous outcome, which is sometimes referred to as \"responder analysis\". In those cases, the loss of information is avoidable. Burdening more participants than necessary is wasteful, costly, and unethical. We provide a method to convert a sample size calculation for the comparison of two proportions into one for the comparison of the means of the underlying continuous outcomes. This demonstrates how much the sample size may be reduced if the outcome were not dichotomized. We also provide a method to calculate the loss of information after a dichotomization. We apply this method to all the trials from the CDSR with a binary outcome, and estimate that on average, only about 60% of the information is retained after dichotomization. We provide R code and a shiny app at: https://vanzwet.shinyapps.io/info_loss/ to do these calculations. We hope that quantifying the loss of information will discourage researchers from dichotomizing continuous outcomes. Instead, we recommend they \"model continuously but interpret dichotomously\". For example, they might present \"percentage achieving clinically meaningful improvement\" derived from a continuous analysis rather than by dichotomizing raw data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70402"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pegah Golchian, Jan Kapar, David S Watson, Marvin N Wright
Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.
{"title":"Missing Value Imputation With Adversarial Random Forests-MissARF.","authors":"Pegah Golchian, Jan Kapar, David S Watson, Marvin N Wright","doi":"10.1002/sim.70379","DOIUrl":"10.1002/sim.70379","url":null,"abstract":"<p><p>Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests (MissARF), based on generative machine learning, that provides both single and multiple imputation. MissARF employs adversarial random forest (ARF) for density estimation and data synthesis. To impute a missing value of an observation, we condition on the non-missing values and sample from the estimated conditional distribution generated by ARF. Our experiments demonstrate that MissARF performs comparably to state-of-the-art single and multiple imputation methods in terms of imputation quality and fast runtime with no additional costs for multiple imputation.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70379"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12871009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Confounding bias and selection bias are two major challenges in causal inference with observational data. While numerous methods have been developed to mitigate confounding bias, they often assume that the data are representative of the study population and ignore the potential selection bias introduced during data collection. In this paper, we propose a unified weighting framework-survey-weighted propensity score weighting-to simultaneously address both confounding and selection biases when the observational dataset is a probability survey sample from a finite population, which is itself viewed as a random sample from the target superpopulation. The proposed method yields a doubly robust inferential procedure for a class of population weighted average treatment effects. We further extend our results to non-probability observational data when the sampling mechanism is unknown but auxiliary information of the confounding variables is available from an external probability sample. We focus on practically important scenarios where the confounders are only partially observed in the external data. Our analysis reveals that the key variables in the external data are those related to both treatment effect heterogeneity and the selection mechanism. We also discuss how to combine auxiliary information from multiple reference probability samples. Monte Carlo simulations and an application to a real-world non-probability observational dataset demonstrate the superiority of our proposed methods over standard propensity score weighting approaches.
{"title":"Causal Inference With Survey Data: A Robust Framework for Propensity Score Weighting in Probability and Non-Probability Samples.","authors":"Wei Liang, Changbao Wu","doi":"10.1002/sim.70420","DOIUrl":"10.1002/sim.70420","url":null,"abstract":"<p><p>Confounding bias and selection bias are two major challenges in causal inference with observational data. While numerous methods have been developed to mitigate confounding bias, they often assume that the data are representative of the study population and ignore the potential selection bias introduced during data collection. In this paper, we propose a unified weighting framework-survey-weighted propensity score weighting-to simultaneously address both confounding and selection biases when the observational dataset is a probability survey sample from a finite population, which is itself viewed as a random sample from the target superpopulation. The proposed method yields a doubly robust inferential procedure for a class of population weighted average treatment effects. We further extend our results to non-probability observational data when the sampling mechanism is unknown but auxiliary information of the confounding variables is available from an external probability sample. We focus on practically important scenarios where the confounders are only partially observed in the external data. Our analysis reveals that the key variables in the external data are those related to both treatment effect heterogeneity and the selection mechanism. We also discuss how to combine auxiliary information from multiple reference probability samples. Monte Carlo simulations and an application to a real-world non-probability observational dataset demonstrate the superiority of our proposed methods over standard propensity score weighting approaches.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70420"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12873465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arce Domingo-Relloso, Yuchen Zhang, Ziqing Wang, Astrid M Suchy-Dicey, Dedra S Buchwald, Ana Navas-Acien, Joel Schwartz, Kiros Berhane, Brent A Coull, Linda Valeri
Not accounting for competing events in survival analysis can lead to biased estimates, as individuals who die from other causes do not have the opportunity to develop the event of interest. Formal definitions and considerations for causal effects in the presence of competing risks have been published, but not for the mediation analysis setting when the exposure is not separable and both the outcome and the mediator are nonterminal events. We propose, for the first time, an approach based on the path-specific effects framework to account for competing risks in longitudinal mediation analysis with time-to-event outcomes. We do so by considering the pathway through the competing event as another mediator, which is nested within our longitudinal mediator of interest. We provide a theoretical formulation and related definitions of the effects of interest based on the mediational g-formula, as well as a detailed description of the algorithm. We also present a simulation study and an application of our algorithm to data from the Strong Heart Study, a prospective cohort of American Indian adults. In this application, we evaluated the mediating role of the blood pressure trajectory (measured in three visits) on the association of arsenic and cadmium with time to cardiovascular disease, accounting for competing risks by death. Identifying the effects through different paths enables us to evaluate the impact of metals on the outcome of interest, as well as through competing risks, more transparently.
{"title":"A Path-Specific Effect Approach to Mediation Analysis With Time-Varying Mediators and Time-to-Event Outcomes Accounting for Competing Risks.","authors":"Arce Domingo-Relloso, Yuchen Zhang, Ziqing Wang, Astrid M Suchy-Dicey, Dedra S Buchwald, Ana Navas-Acien, Joel Schwartz, Kiros Berhane, Brent A Coull, Linda Valeri","doi":"10.1002/sim.70425","DOIUrl":"10.1002/sim.70425","url":null,"abstract":"<p><p>Not accounting for competing events in survival analysis can lead to biased estimates, as individuals who die from other causes do not have the opportunity to develop the event of interest. Formal definitions and considerations for causal effects in the presence of competing risks have been published, but not for the mediation analysis setting when the exposure is not separable and both the outcome and the mediator are nonterminal events. We propose, for the first time, an approach based on the path-specific effects framework to account for competing risks in longitudinal mediation analysis with time-to-event outcomes. We do so by considering the pathway through the competing event as another mediator, which is nested within our longitudinal mediator of interest. We provide a theoretical formulation and related definitions of the effects of interest based on the mediational g-formula, as well as a detailed description of the algorithm. We also present a simulation study and an application of our algorithm to data from the Strong Heart Study, a prospective cohort of American Indian adults. In this application, we evaluated the mediating role of the blood pressure trajectory (measured in three visits) on the association of arsenic and cadmium with time to cardiovascular disease, accounting for competing risks by death. Identifying the effects through different paths enables us to evaluate the impact of metals on the outcome of interest, as well as through competing risks, more transparently.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70425"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12873459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling prognosis has unique significance in cancer research. For this purpose, omics data have been routinely used. In a series of recent studies, pathological imaging data derived from biopsy have also been shown as informative. Motivated by the complementary information contained in omics and pathological imaging data, we examine integrating them under a Cox modeling framework. The two types of data have distinct properties: for omics variables, which are more actionable and demand stronger interpretability, we model their effects in a parametric way; whereas for pathological imaging features, which are not actionable and do not have lucid interpretations, we model their effects in a nonparametric way for better flexibility and prediction performance. Specifically, we adopt deep neural networks (DNNs) for nonparametric estimation, considering their advantages over regression models in accommodating nonlinearity and providing better prediction. As both omics and pathological imaging data are high-dimensional and are expected to contain noises, we propose applying penalization for selecting relevant variables and regulating estimation. Different from some existing studies, we pay unique attention to overlapping information contained in the two types of data. Numerical investigations are carefully carried out. In the analysis of TCGA data, sensible selection and superior prediction performance are observed, which demonstrates the practical utility of the proposed analysis.
{"title":"Integrating Omics and Pathological Imaging Data for Cancer Prognosis via a Deep Neural Network-Based Cox Model.","authors":"Jingmao Li, Shuangge Ma","doi":"10.1002/sim.70435","DOIUrl":"https://doi.org/10.1002/sim.70435","url":null,"abstract":"<p><p>Modeling prognosis has unique significance in cancer research. For this purpose, omics data have been routinely used. In a series of recent studies, pathological imaging data derived from biopsy have also been shown as informative. Motivated by the complementary information contained in omics and pathological imaging data, we examine integrating them under a Cox modeling framework. The two types of data have distinct properties: for omics variables, which are more actionable and demand stronger interpretability, we model their effects in a parametric way; whereas for pathological imaging features, which are not actionable and do not have lucid interpretations, we model their effects in a nonparametric way for better flexibility and prediction performance. Specifically, we adopt deep neural networks (DNNs) for nonparametric estimation, considering their advantages over regression models in accommodating nonlinearity and providing better prediction. As both omics and pathological imaging data are high-dimensional and are expected to contain noises, we propose applying penalization for selecting relevant variables and regulating estimation. Different from some existing studies, we pay unique attention to overlapping information contained in the two types of data. Numerical investigations are carefully carried out. In the analysis of TCGA data, sensible selection and superior prediction performance are observed, which demonstrates the practical utility of the proposed analysis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70435"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}