Sparse regression problems, where the goal is to identify a small set of relevant predictors, often require modeling not only main effects but also meaningful interactions through other variables. While the pliable lasso has emerged as a powerful frequentist tool for modeling such interactions under strong heredity constraints, it lacks a natural framework for uncertainty quantification and incorporation of prior knowledge. In this paper, we propose a Bayesian pliable lasso that extends this approach by placing sparsity-inducing priors, such as the horseshoe, on both main and interaction effects. The hierarchical prior structure enforces heredity constraints while adaptively shrinking irrelevant coefficients and allowing important effects to persist. We extend this framework to generalized linear models and develop a tailored approach to handle missing responses. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm based on a reparameterization of the horseshoe prior. Our Bayesian framework yields sparse, interpretable interaction structures, and principled measures of uncertainty. Through simulations and real-data studies, we demonstrate its advantages over existing methods in recovering complex interaction patterns under both complete and incomplete data. Our method is implemented in the package hspliable available on Github: https://github.com/tienmt/hspliable.
{"title":"Bayesian Pliable Lasso With Horseshoe Prior for Interaction Effects in GLMs With Missing Responses.","authors":"The Tien Mai","doi":"10.1002/sim.70406","DOIUrl":"https://doi.org/10.1002/sim.70406","url":null,"abstract":"<p><p>Sparse regression problems, where the goal is to identify a small set of relevant predictors, often require modeling not only main effects but also meaningful interactions through other variables. While the pliable lasso has emerged as a powerful frequentist tool for modeling such interactions under strong heredity constraints, it lacks a natural framework for uncertainty quantification and incorporation of prior knowledge. In this paper, we propose a Bayesian pliable lasso that extends this approach by placing sparsity-inducing priors, such as the horseshoe, on both main and interaction effects. The hierarchical prior structure enforces heredity constraints while adaptively shrinking irrelevant coefficients and allowing important effects to persist. We extend this framework to generalized linear models and develop a tailored approach to handle missing responses. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm based on a reparameterization of the horseshoe prior. Our Bayesian framework yields sparse, interpretable interaction structures, and principled measures of uncertainty. Through simulations and real-data studies, we demonstrate its advantages over existing methods in recovering complex interaction patterns under both complete and incomplete data. Our method is implemented in the package hspliable available on Github: https://github.com/tienmt/hspliable.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70406"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Misclassification Simulation-Extrapolation (MC-SIMEX) is an established method to correct for misclassification in binary covariates in a model. It involves the use of a simulation component which simulates pseudo-datasets with added degree of misclassification in the binary covariate and an extrapolation component which models the covariate's regression coefficients obtained at each level of misclassification using a quadratic function. This quadratic function is then used to extrapolate the covariate's regression coefficients to a point of "no error" in the classification of the binary covariate under question. However, extrapolation functions are not usually known accurately beforehand and are therefore only approximated versions. In this article, we propose an innovative method that uses the exact (not approximated) extrapolation function through the use of a derived relationship between the naïve regression coefficient estimates and the true coefficients in generalized linear models. Simulation studies are conducted to study and compare the numerical properties of the resulting estimator to the original MC-SIMEX estimator. Real data analysis using colon cancer data from the MSKCC cancer registry is also provided.
{"title":"An Improved Misclassification Simulation Extrapolation (MC-SIMEX) Algorithm.","authors":"Varadan Sevilimedu, Lili Yu","doi":"10.1002/sim.70418","DOIUrl":"https://doi.org/10.1002/sim.70418","url":null,"abstract":"<p><p>Misclassification Simulation-Extrapolation (MC-SIMEX) is an established method to correct for misclassification in binary covariates in a model. It involves the use of a simulation component which simulates pseudo-datasets with added degree of misclassification in the binary covariate and an extrapolation component which models the covariate's regression coefficients obtained at each level of misclassification using a quadratic function. This quadratic function is then used to extrapolate the covariate's regression coefficients to a point of \"no error\" in the classification of the binary covariate under question. However, extrapolation functions are not usually known accurately beforehand and are therefore only approximated versions. In this article, we propose an innovative method that uses the exact (not approximated) extrapolation function through the use of a derived relationship between the naïve regression coefficient estimates and the true coefficients in generalized linear models. Simulation studies are conducted to study and compare the numerical properties of the resulting estimator to the original MC-SIMEX estimator. Real data analysis using colon cancer data from the MSKCC cancer registry is also provided.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70418"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A popular approach to growth reference centile estimation is the LMS (Lambda-Mu-Sigma) method, which assumes a parametric distribution for response variable and fits the location, scale and shape parameters of the distribution of as smooth functions of explanatory variable . This article provides two methods, transformation and adaptive smoothing, for improving the centile estimation when there is high curvature (i.e., rapid change in slope) with respect to in one or more of the distribution parameters. In general, high curvature is reduced (i.e., attenuated or dampened) by smoothing. In the first method, is transformed to variable to reduce this high curvature, and the distribution parameters are fitted as smooth functions of . Three different transformations of are described. In the second method, the distribution parameters are adaptively smoothed against by allowing the smoothing parameter itself to vary continuously with . Simulations are used to compare the performance of the two methods. Three examples show how the process can lead to substantially smoother and better fitting centiles.
一种常用的生长参考百分位数估计方法是LMS (Lambda-Mu-Sigma)方法,该方法假设响应变量Y $$ Y $$的参数分布,并将Y $$ Y $$分布的位置、规模和形状参数拟合为解释变量X $$ X $$的光滑函数。本文提供了变换和自适应平滑两种方法,用于在一个或多个Y $$ Y $$分布参数中存在相对于X $$ X $$的高曲率(即斜率的快速变化)时改进分位数估计。一般来说,通过平滑可以减少高曲率(即衰减或阻尼)。在第一种方法中,将X $$ X $$转换为变量T $$ T $$以减小这种高曲率,并将Y $$ Y $$分布参数拟合为T $$ T $$的光滑函数。描述了X $$ X $$的三种不同变换。在第二种方法中,通过允许平滑参数本身随Y $$ Y $$连续变化,Y $$ Y $$分布参数针对X $$ X $$进行自适应平滑。通过仿真比较了两种方法的性能。三个例子显示了该过程如何导致更平滑和更好的拟合百分位数。
{"title":"Improved Centile Estimation by Transformation And/Or Adaptive Smoothing of the Explanatory Variable.","authors":"R A Rigby, D M Stasinopoulos, T J Cole","doi":"10.1002/sim.70414","DOIUrl":"10.1002/sim.70414","url":null,"abstract":"<p><p>A popular approach to growth reference centile estimation is the LMS (Lambda-Mu-Sigma) method, which assumes a parametric distribution for response variable <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> and fits the location, scale and shape parameters of the distribution of <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> as smooth functions of explanatory variable <math> <semantics><mrow><mi>X</mi></mrow> <annotation>$$ X $$</annotation></semantics> </math> . This article provides two methods, transformation and adaptive smoothing, for improving the centile estimation when there is high curvature (i.e., rapid change in slope) with respect to <math> <semantics><mrow><mi>X</mi></mrow> <annotation>$$ X $$</annotation></semantics> </math> in one or more of the <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> distribution parameters. In general, high curvature is reduced (i.e., attenuated or dampened) by smoothing. In the first method, <math> <semantics><mrow><mi>X</mi></mrow> <annotation>$$ X $$</annotation></semantics> </math> is transformed to variable <math> <semantics><mrow><mi>T</mi></mrow> <annotation>$$ T $$</annotation></semantics> </math> to reduce this high curvature, and the <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> distribution parameters are fitted as smooth functions of <math> <semantics><mrow><mi>T</mi></mrow> <annotation>$$ T $$</annotation></semantics> </math> . Three different transformations of <math> <semantics><mrow><mi>X</mi></mrow> <annotation>$$ X $$</annotation></semantics> </math> are described. In the second method, the <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> distribution parameters are adaptively smoothed against <math> <semantics><mrow><mi>X</mi></mrow> <annotation>$$ X $$</annotation></semantics> </math> by allowing the smoothing parameter itself to vary continuously with <math> <semantics><mrow><mi>Y</mi></mrow> <annotation>$$ Y $$</annotation></semantics> </math> . Simulations are used to compare the performance of the two methods. Three examples show how the process can lead to substantially smoother and better fitting centiles.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70414"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874224/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, there has been growing concern about heavy-tailed and skewed noise in biological data. We introduce RobustPALMRT, a flexible permutation framework for testing the association of a covariate of interest adjusted for control covariates. RobustPALMRT controls type I error rate for finite-samples, even in the presence of heavy-tailed or skewed noise. The new framework expands the scope of state-of-the-art tests in three directions. First, our method applies to robust and quantile regressions, even with the necessary hyper-parameter tuning. Second, by separating model-fitting and model-evaluation, we discover that performance improves when using a robust loss function in the model-evaluation step, regardless of how the model is fit. Third, we allow fitting multiple models to detect specialized features of interest in a distribution. To demonstrate this, we introduce DispersionPALMRT, which tests for differences in dispersion between treatment and control groups. We establish theoretical guarantees, identify settings where our method has greater power than existing methods, and analyze existing immunological data on Long-COVID patients. Using RobustPALMRT, we unveil novel differences between Long-COVID patients and others even in the presence of highly skewed noise.
{"title":"Robust Distribution-Free Tests for the Linear Model.","authors":"Torey Hilbert, Steven N MacEachern, Yuan Zhang","doi":"10.1002/sim.70404","DOIUrl":"10.1002/sim.70404","url":null,"abstract":"<p><p>Recently, there has been growing concern about heavy-tailed and skewed noise in biological data. We introduce RobustPALMRT, a flexible permutation framework for testing the association of a covariate of interest adjusted for control covariates. RobustPALMRT controls type I error rate for finite-samples, even in the presence of heavy-tailed or skewed noise. The new framework expands the scope of state-of-the-art tests in three directions. First, our method applies to robust and quantile regressions, even with the necessary hyper-parameter tuning. Second, by separating model-fitting and model-evaluation, we discover that performance improves when using a robust loss function in the model-evaluation step, regardless of how the model is fit. Third, we allow fitting multiple models to detect specialized features of interest in a distribution. To demonstrate this, we introduce DispersionPALMRT, which tests for differences in dispersion between treatment and control groups. We establish theoretical guarantees, identify settings where our method has greater power than existing methods, and analyze existing immunological data on Long-COVID patients. Using RobustPALMRT, we unveil novel differences between Long-COVID patients and others even in the presence of highly skewed noise.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70404"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study evaluates the performance of the Unrestricted Weighted Least Squares (UWLS) estimator in meta-analyses of medical research. Using a large-scale simulation approach, it addresses the limitations of model selection criteria in small-sample contexts. Prior research using the Cochrane Database of Systematic Reviews (CDSR) reported that UWLS outperformed Random Effects (RE) and, in some cases, Fixed Effect (FE) estimators when assessed using AIC and BIC. However, we show that idiosyncratic characteristics of the CDSR datasets, notably their small sample sizes and weak-signal settings (where key parameters are often small in magnitude), undermine the reliability of AIC and BIC for model selection. Accordingly, we simulate 108 000 datasets mirroring the original CDSR data. This allows us to know the true model parameters and evaluate the estimators more accurately. While all estimators performed similarly with respect to bias and efficiency, RE consistently produced more accurate standard errors than UWLS, making confidence intervals and hypothesis testing more reliable. The comparison with FE was less clear. We therefore recommend continued use of the RE estimator as a reliable general-purpose approach for medical research, with the choice between UWLS and FE made in light of the likely extent of effect heterogeneity in the data.
{"title":"Is UWLS Really Better for Medical Research?","authors":"Sanghyun Hong, W Robert Reed","doi":"10.1002/sim.70411","DOIUrl":"10.1002/sim.70411","url":null,"abstract":"<p><p>This study evaluates the performance of the Unrestricted Weighted Least Squares (UWLS) estimator in meta-analyses of medical research. Using a large-scale simulation approach, it addresses the limitations of model selection criteria in small-sample contexts. Prior research using the Cochrane Database of Systematic Reviews (CDSR) reported that UWLS outperformed Random Effects (RE) and, in some cases, Fixed Effect (FE) estimators when assessed using AIC and BIC. However, we show that idiosyncratic characteristics of the CDSR datasets, notably their small sample sizes and weak-signal settings (where key parameters are often small in magnitude), undermine the reliability of AIC and BIC for model selection. Accordingly, we simulate 108 000 datasets mirroring the original CDSR data. This allows us to know the true model parameters and evaluate the estimators more accurately. While all estimators performed similarly with respect to bias and efficiency, RE consistently produced more accurate standard errors than UWLS, making confidence intervals and hypothesis testing more reliable. The comparison with FE was less clear. We therefore recommend continued use of the RE estimator as a reliable general-purpose approach for medical research, with the choice between UWLS and FE made in light of the likely extent of effect heterogeneity in the data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70411"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874514/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Randomized clinical trials are the gold standard for evaluating the benefits and harms of interventions, though they often fail to provide the necessary evidence to inform medical decision-making. Primary reasons are failure to recognize the most important questions for informing clinical practice, and that traditional approaches do not directly address these most important questions, and subsequently not using these most important questions as the motivation for the design, monitoring, analysis, and reporting of clinical trials. The standard approach of analyzing one outcome at a time fails to incorporate associations between or the cumulative nature of multiple outcomes in individual patients, suffers from competing risk complexities during interpretation of individual outcomes, fails to recognize important gradations of patient-centric responses, and since efficacy and safety analyses are often conducted on different populations, benefit:risk estimands and generalizability are unclear. Cardiovascular event prevention trials typically utilize: (1) major adverse cardiovascular events (MACE), for example, stroke, myocardial infarction, and death as the primary endpoint, which fails to recognize multiple events or the differential importance of events, and (2) relative risk models which rely on robustness-challenging modeling assumptions and are contraindicated in benefit:risk and multiple outcome evaluation. The Desirability Of Outcome Ranking (DOOR) is a paradigm for the design, data monitoring, analysis, interpretation, and reporting of clinical trials based on comprehensive patient-centric benefit:risk evaluation, developed to address these issues and advance clinical trial science. The rationale and the methodology for the design and analyses for the DOOR paradigm are described. The methods are illustrated using an example. Freely available online tools for the design and analysis of studies implementing the DOOR are provided.
{"title":"Patient-Centric Pragmatic Clinical Trials: Opening the DOOR.","authors":"Scott R Evans, Qihang Wu, Toshimitsu Hamasaki","doi":"10.1002/sim.70328","DOIUrl":"https://doi.org/10.1002/sim.70328","url":null,"abstract":"<p><p>Randomized clinical trials are the gold standard for evaluating the benefits and harms of interventions, though they often fail to provide the necessary evidence to inform medical decision-making. Primary reasons are failure to recognize the most important questions for informing clinical practice, and that traditional approaches do not directly address these most important questions, and subsequently not using these most important questions as the motivation for the design, monitoring, analysis, and reporting of clinical trials. The standard approach of analyzing one outcome at a time fails to incorporate associations between or the cumulative nature of multiple outcomes in individual patients, suffers from competing risk complexities during interpretation of individual outcomes, fails to recognize important gradations of patient-centric responses, and since efficacy and safety analyses are often conducted on different populations, benefit:risk estimands and generalizability are unclear. Cardiovascular event prevention trials typically utilize: (1) major adverse cardiovascular events (MACE), for example, stroke, myocardial infarction, and death as the primary endpoint, which fails to recognize multiple events or the differential importance of events, and (2) relative risk models which rely on robustness-challenging modeling assumptions and are contraindicated in benefit:risk and multiple outcome evaluation. The Desirability Of Outcome Ranking (DOOR) is a paradigm for the design, data monitoring, analysis, interpretation, and reporting of clinical trials based on comprehensive patient-centric benefit:risk evaluation, developed to address these issues and advance clinical trial science. The rationale and the methodology for the design and analyses for the DOOR paradigm are described. The methods are illustrated using an example. Freely available online tools for the design and analysis of studies implementing the DOOR are provided.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70328"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop an integrative joint model for multivariate sparse functional and survival data to analyze Alzheimer's disease (AD) across multiple studies. To address missing-by-design outcomes in multi-cohort studies, our approach extends the multivariate functional mixed model (MFMM), which integrates longitudinal outcomes to extract shared disease progression trajectories and links these outcomes to time-to-event data through a parsimonious survival model. This framework balances flexibility and interpretability by modeling shared progression trajectories while accommodating cohort-specific mean functions and survival parameters. For efficient estimation, we incorporate penalized splines into an EM algorithm. Application to three AD cohorts demonstrates the model's ability to capture disease trajectories and account for inter-cohort variability. Simulation studies confirm its robustness and accuracy, highlighting its value in advancing the understanding of AD progression and supporting clinical decision-making in multi-cohort settings.
{"title":"A Functional Joint Model for Survival and Multivariate Sparse Functional Data in Multi-Cohort Alzheimer's Disease Study.","authors":"Wenyi Wang, Luo Xiao, Ruonan Li, Sheng Luo","doi":"10.1002/sim.70442","DOIUrl":"https://doi.org/10.1002/sim.70442","url":null,"abstract":"<p><p>We develop an integrative joint model for multivariate sparse functional and survival data to analyze Alzheimer's disease (AD) across multiple studies. To address missing-by-design outcomes in multi-cohort studies, our approach extends the multivariate functional mixed model (MFMM), which integrates longitudinal outcomes to extract shared disease progression trajectories and links these outcomes to time-to-event data through a parsimonious survival model. This framework balances flexibility and interpretability by modeling shared progression trajectories while accommodating cohort-specific mean functions and survival parameters. For efficient estimation, we incorporate penalized splines into an EM algorithm. Application to three AD cohorts demonstrates the model's ability to capture disease trajectories and account for inter-cohort variability. Simulation studies confirm its robustness and accuracy, highlighting its value in advancing the understanding of AD progression and supporting clinical decision-making in multi-cohort settings.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70442"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mediation analysis aims to identify and estimate the effect of an exposure on an outcome that is mediated through one or more intermediate variables. In the presence of multiple intermediate variables, two pertinent methodological questions arise: estimating mediated effects when mediators are correlated, and performing high-dimensional mediation analyses when the number of mediators exceeds the sample size. This paper presents a two-step procedure for high-dimensional mediation analyses. The first step selects a reduced number of candidate mediators using an ad-hoc lasso penalty. The second step applies a procedure we previously developed to estimate the mediated effects, accounting for the correlation structure among the retained candidate mediators. We compare the performance of the proposed two-step procedure with state-of-the-art methods using simulated data. Additionally, we demonstrate its practical application by estimating the causal role of DNA methylation (DNAm) in the pathway between smoking and rheumatoid arthritis (RA) using real data.
{"title":"Group Lasso Based Selection for High-Dimensional Mediation Analysis.","authors":"Allan Jérolon, Flora Alarcon, Florence Pittion, Magali Richard, Olivier François, Etienne Birmelé, Vittorio Perduca","doi":"10.1002/sim.70351","DOIUrl":"https://doi.org/10.1002/sim.70351","url":null,"abstract":"<p><p>Mediation analysis aims to identify and estimate the effect of an exposure on an outcome that is mediated through one or more intermediate variables. In the presence of multiple intermediate variables, two pertinent methodological questions arise: estimating mediated effects when mediators are correlated, and performing high-dimensional mediation analyses when the number of mediators exceeds the sample size. This paper presents a two-step procedure for high-dimensional mediation analyses. The first step selects a reduced number of candidate mediators using an ad-hoc lasso penalty. The second step applies a procedure we previously developed to estimate the mediated effects, accounting for the correlation structure among the retained candidate mediators. We compare the performance of the proposed two-step procedure with state-of-the-art methods using simulated data. Additionally, we demonstrate its practical application by estimating the causal role of DNA methylation (DNAm) in the pathway between smoking and rheumatoid arthritis (RA) using real data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70351"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer prevention is recognized as a key strategy for reducing disease incidence, mortality, and the overall burden on individuals and society. However, determining when to begin preventive interventions presents a significant challenge: starting too early may lead to more interventions and increased lifetime burdens due to repeated administrations, while delaying may miss opportunities to prevent cancer. Evidence-based recommendations require a benefit-burden analysis that weighs life-years gained against the burden of interventions. With the growing availability of large-scale observational data, there is now an opportunity to empirically evaluate these trade-offs. In this paper, we propose a causal framework for assessing the benefit and burden of cancer prevention, using an illness-death model with semi-competing risks. Extensive simulations demonstrate that the proposed estimators are unbiased, with robust inference across realistic scenarios. We apply this approach to a benefit-burden analysis of the preventive screening for colorectal cancer, utilizing data from the large-scale Women's Health Initiative. Our findings suggest that initiating screening at age 50 years achieves the highest life-year gains with an acceptable incremental burden-to-benefit ratio compared to no screening, contributing valuable real-world evidence to the field of preventive cancer interventions.
{"title":"Assessing the Benefits and Burdens of Preventive Interventions.","authors":"Yi Xiong, Kwun C G Chan, Malka Gorfine, Li Hsu","doi":"10.1002/sim.70410","DOIUrl":"https://doi.org/10.1002/sim.70410","url":null,"abstract":"<p><p>Cancer prevention is recognized as a key strategy for reducing disease incidence, mortality, and the overall burden on individuals and society. However, determining when to begin preventive interventions presents a significant challenge: starting too early may lead to more interventions and increased lifetime burdens due to repeated administrations, while delaying may miss opportunities to prevent cancer. Evidence-based recommendations require a benefit-burden analysis that weighs life-years gained against the burden of interventions. With the growing availability of large-scale observational data, there is now an opportunity to empirically evaluate these trade-offs. In this paper, we propose a causal framework for assessing the benefit and burden of cancer prevention, using an illness-death model with semi-competing risks. Extensive simulations demonstrate that the proposed estimators are unbiased, with robust inference across realistic scenarios. We apply this approach to a benefit-burden analysis of the preventive screening for colorectal cancer, utilizing data from the large-scale Women's Health Initiative. Our findings suggest that initiating screening at age 50 years achieves the highest life-year gains with an acceptable incremental burden-to-benefit ratio compared to no screening, contributing valuable real-world evidence to the field of preventive cancer interventions.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70410"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jimmy Hickey, Jonathan P Williams, Brian J Reich, Emily C Hector
Untreated periodontitis causes inflammation within the supporting tissue of the teeth and can ultimately lead to tooth loss. Modeling periodontal outcomes is beneficial as they are difficult and time-consuming to measure, but disparities in representation between demographic groups must be considered. There may not be enough participants to build group-specific models, and it can be ineffective, and even dangerous, to apply a model to participants in an underrepresented group if demographic differences were not considered during training. We propose an extension to the RECaST Bayesian transfer learning framework. Our method jointly models multivariate outcomes, exhibiting significant improvement over the previous univariate RECaST method. Further, we introduce an online approach to model sequential data sets. Negative transfer is mitigated to ensure that the information shared from the other demographic groups does not negatively impact the modeling of the underrepresented participants. The Bayesian framework naturally provides uncertainty quantification on predictions. Especially important in medical applications, our method does not share data between domains. We demonstrate the effectiveness of our method in both predictive performance and uncertainty quantification on simulated data and on a database of dental records from the HealthPartners Institute.
{"title":"Multivariate and Online Transfer Learning With Uncertainty Quantification.","authors":"Jimmy Hickey, Jonathan P Williams, Brian J Reich, Emily C Hector","doi":"10.1002/sim.70398","DOIUrl":"10.1002/sim.70398","url":null,"abstract":"<p><p>Untreated periodontitis causes inflammation within the supporting tissue of the teeth and can ultimately lead to tooth loss. Modeling periodontal outcomes is beneficial as they are difficult and time-consuming to measure, but disparities in representation between demographic groups must be considered. There may not be enough participants to build group-specific models, and it can be ineffective, and even dangerous, to apply a model to participants in an underrepresented group if demographic differences were not considered during training. We propose an extension to the RECaST Bayesian transfer learning framework. Our method jointly models multivariate outcomes, exhibiting significant improvement over the previous univariate RECaST method. Further, we introduce an online approach to model sequential data sets. Negative transfer is mitigated to ensure that the information shared from the other demographic groups does not negatively impact the modeling of the underrepresented participants. The Bayesian framework naturally provides uncertainty quantification on predictions. Especially important in medical applications, our method does not share data between domains. We demonstrate the effectiveness of our method in both predictive performance and uncertainty quantification on simulated data and on a database of dental records from the HealthPartners Institute.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 3-5","pages":"e70398"},"PeriodicalIF":1.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}