Analysis of non-Euclidean data accumulated from human longevity studies, brain functional network studies, and many other areas has become an important issue in modern statistics. Fréchet sufficient dimension reduction aims to identify dependencies between non-Euclidean object-valued responses and multivariate predictors while simultaneously reducing the dimensionality of the predictors. We introduce the distance weighted directional regression method for both linear and nonlinear Fréchet sufficient dimension reduction. We propose a new formulation of the classical directional regression method in sufficient dimension reduction. The new formulation is based on distance weighting, thus providing a unified approach for sufficient dimension reduction with Euclidean and non-Euclidean responses, and is further extended to nonlinear Fréchet sufficient dimension reduction. We derive the asymptotic normality of the linear Fréchet directional regression estimator and the convergence rate of the nonlinear estimator. Simulation studies are presented to demonstrate the empirical performance of the proposed methods and to support our theoretical findings. The application to human mortality modeling and diabetes prevalence analysis show that our proposal can improve interpretation and out-of-sample prediction.
{"title":"Distance weighted directional regression for Fréchet sufficient dimension reduction.","authors":"Chao Ying, Zhou Yu, Xin Zhang","doi":"10.1093/biomtc/ujaf051","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf051","url":null,"abstract":"<p><p>Analysis of non-Euclidean data accumulated from human longevity studies, brain functional network studies, and many other areas has become an important issue in modern statistics. Fréchet sufficient dimension reduction aims to identify dependencies between non-Euclidean object-valued responses and multivariate predictors while simultaneously reducing the dimensionality of the predictors. We introduce the distance weighted directional regression method for both linear and nonlinear Fréchet sufficient dimension reduction. We propose a new formulation of the classical directional regression method in sufficient dimension reduction. The new formulation is based on distance weighting, thus providing a unified approach for sufficient dimension reduction with Euclidean and non-Euclidean responses, and is further extended to nonlinear Fréchet sufficient dimension reduction. We derive the asymptotic normality of the linear Fréchet directional regression estimator and the convergence rate of the nonlinear estimator. Simulation studies are presented to demonstrate the empirical performance of the proposed methods and to support our theoretical findings. The application to human mortality modeling and diabetes prevalence analysis show that our proposal can improve interpretation and out-of-sample prediction.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143975586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The coexistences of high dimensionality and strong correlation in both responses and predictors pose unprecedented challenges in identifying important predictors. In this paper, we propose a model-free conditional feature screening method with false discovery rate (FDR) control for ultrahigh-dimensional multi-response setting. The proposed method is built upon partial distance correlation, which measures the dependence between two random vectors while controlling effect for a multivariate random vector. This screening approach is robust against heavy-tailed data and can select predictors in instances of high correlation among predictors. Additionally, it can identify predictors that are marginally unrelated but conditionally related with the response. Leveraging the advantageous properties of partial distance correlation, our method allows for high-dimensional variables to be conditioned upon, distinguishing it from current research in this field. To further achieve FDR control, we apply derandomized knockoff-e-values to establish the threshold for feature screening more stably. The proposed FDR control method is shown to enjoy sure screening property while maintaining FDR control as well as achieving higher power under mild conditions. The superior performance of these methods is demonstrated through simulation examples and a real data application.
{"title":"PDC-MAKES: a conditional screening method for controlling false discoveries in high-dimensional multi-response setting.","authors":"Wei Xiong, Han Pan, Tong Shen","doi":"10.1093/biomtc/ujaf042","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf042","url":null,"abstract":"<p><p>The coexistences of high dimensionality and strong correlation in both responses and predictors pose unprecedented challenges in identifying important predictors. In this paper, we propose a model-free conditional feature screening method with false discovery rate (FDR) control for ultrahigh-dimensional multi-response setting. The proposed method is built upon partial distance correlation, which measures the dependence between two random vectors while controlling effect for a multivariate random vector. This screening approach is robust against heavy-tailed data and can select predictors in instances of high correlation among predictors. Additionally, it can identify predictors that are marginally unrelated but conditionally related with the response. Leveraging the advantageous properties of partial distance correlation, our method allows for high-dimensional variables to be conditioned upon, distinguishing it from current research in this field. To further achieve FDR control, we apply derandomized knockoff-e-values to establish the threshold for feature screening more stably. The proposed FDR control method is shown to enjoy sure screening property while maintaining FDR control as well as achieving higher power under mild conditions. The superior performance of these methods is demonstrated through simulation examples and a real data application.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many studies employ the analysis of time-to-event data that incorporates competing risks and right censoring. Most methods and software packages are geared towards analyzing data that comes from a continuous failure time distribution. However, failure-time data may sometimes be discrete either because time is inherently discrete or due to imprecise measurement. This paper introduces a new estimation procedure for discrete-time survival analysis with competing events. The proposed approach offers a major key advantage over existing procedures and allows for straightforward integration and application of widely used regularized regression and screening-features methods. We illustrate the benefits of our proposed approach by a comprehensive simulation study. Additionally, we showcase the utility of the proposed procedure by estimating a survival model for the length of stay of patients hospitalized in the intensive care unit, considering 3 competing events: discharge to home, transfer to another medical facility, and in-hospital death. A Python package, PyDTS, is available for applying the proposed method with additional features.
{"title":"Discrete-time competing-risks regression with or without penalization.","authors":"Tomer Meir, Malka Gorfine","doi":"10.1093/biomtc/ujaf040","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf040","url":null,"abstract":"<p><p>Many studies employ the analysis of time-to-event data that incorporates competing risks and right censoring. Most methods and software packages are geared towards analyzing data that comes from a continuous failure time distribution. However, failure-time data may sometimes be discrete either because time is inherently discrete or due to imprecise measurement. This paper introduces a new estimation procedure for discrete-time survival analysis with competing events. The proposed approach offers a major key advantage over existing procedures and allows for straightforward integration and application of widely used regularized regression and screening-features methods. We illustrate the benefits of our proposed approach by a comprehensive simulation study. Additionally, we showcase the utility of the proposed procedure by estimating a survival model for the length of stay of patients hospitalized in the intensive care unit, considering 3 competing events: discharge to home, transfer to another medical facility, and in-hospital death. A Python package, PyDTS, is available for applying the proposed method with additional features.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapid advances in high-throughput sequencing technologies have led to the fast accumulation of high-dimensional data, which is harnessed for understanding the implications of various factors on human disease and health. While dimension reduction plays an essential role in high-dimensional regression and classification, existing methods often require the predictors to be continuous, making them unsuitable for discrete data, such as presence-absence records of species in community ecology and sequencing reads in single-cell studies. To identify and estimate sufficient reductions in regressions with discrete predictors, we introduce probabilistic exponential family inverse regression (PrEFIR), assuming that, given the response and a set of latent factors, the predictors follow one-parameter exponential families. We show that the low-dimensional reductions result not only from the response variable but also from the latent factors. We further extend the latent factor modeling framework to the double exponential family by including an additional parameter to account for the dispersion. This versatile framework encompasses regressions with all categorical or a mixture of categorical and continuous predictors. We propose the method of maximum hierarchical likelihood for estimation, and develop a highly parallelizable algorithm for its computation. The effectiveness of PrEFIR is demonstrated through simulation studies and real data examples.
{"title":"Probabilistic exponential family inverse regression and its applications.","authors":"Daolin Pang, Ruoqing Zhu, Hongyu Zhao, Tao Wang","doi":"10.1093/biomtc/ujaf065","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf065","url":null,"abstract":"<p><p>Rapid advances in high-throughput sequencing technologies have led to the fast accumulation of high-dimensional data, which is harnessed for understanding the implications of various factors on human disease and health. While dimension reduction plays an essential role in high-dimensional regression and classification, existing methods often require the predictors to be continuous, making them unsuitable for discrete data, such as presence-absence records of species in community ecology and sequencing reads in single-cell studies. To identify and estimate sufficient reductions in regressions with discrete predictors, we introduce probabilistic exponential family inverse regression (PrEFIR), assuming that, given the response and a set of latent factors, the predictors follow one-parameter exponential families. We show that the low-dimensional reductions result not only from the response variable but also from the latent factors. We further extend the latent factor modeling framework to the double exponential family by including an additional parameter to account for the dispersion. This versatile framework encompasses regressions with all categorical or a mixture of categorical and continuous predictors. We propose the method of maximum hierarchical likelihood for estimation, and develop a highly parallelizable algorithm for its computation. The effectiveness of PrEFIR is demonstrated through simulation studies and real data examples.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Covariate-adaptive randomization (CAR) is widely adopted in clinical trials to ensure balanced treatment allocations across key baseline covariates. Although much research has focused on analyzing average treatment effects, the inference of relative risk under CAR experiments has been less thoroughly explored. In this study, we examine a covariate-adjusted estimate of relative risk and investigate the properties of its associated hypothesis tests under CAR. We first derive the theoretical properties of the covariate-adjusted relative risk for a broad class of CAR procedures. Our findings indicate that conventional tests for relative risk tend to be conservative, leading to reduced type I error rates. To mitigate this issue, we introduce model-based and model-robust methods that enhance the estimation of standard errors. We demonstrate the validity and usage of model-robust and model-based adjusted tests. Extensive numerical studies have been conducted to demonstrate our theoretical findings and the favorable properties of the proposed adjustment methods.
临床试验中广泛采用协变量自适应随机化(CAR),以确保关键基线协变量的治疗分配均衡。尽管很多研究都集中于分析平均治疗效果,但对 CAR 试验下相对风险的推断探讨得还不够深入。在本研究中,我们研究了经协变因素调整的相对风险估计值,并探讨了其在 CAR 条件下的相关假设检验特性。首先,我们推导出了一大类 CAR 程序的协变量调整后相对风险的理论属性。我们的研究结果表明,传统的相对风险检验趋于保守,导致 I 类错误率降低。为了缓解这一问题,我们引入了基于模型和模型稳健的方法,以加强对标准误差的估计。我们展示了基于模型和基于模型的调整检验的有效性和使用方法。我们进行了广泛的数值研究,以证明我们的理论发现和所提出的调整方法的有利特性。
{"title":"Statistical inference on the relative risk following covariate-adaptive randomization.","authors":"Fengyu Zhao, Yang Liu, Feifang Hu","doi":"10.1093/biomtc/ujaf036","DOIUrl":"10.1093/biomtc/ujaf036","url":null,"abstract":"<p><p>Covariate-adaptive randomization (CAR) is widely adopted in clinical trials to ensure balanced treatment allocations across key baseline covariates. Although much research has focused on analyzing average treatment effects, the inference of relative risk under CAR experiments has been less thoroughly explored. In this study, we examine a covariate-adjusted estimate of relative risk and investigate the properties of its associated hypothesis tests under CAR. We first derive the theoretical properties of the covariate-adjusted relative risk for a broad class of CAR procedures. Our findings indicate that conventional tests for relative risk tend to be conservative, leading to reduced type I error rates. To mitigate this issue, we introduce model-based and model-robust methods that enhance the estimation of standard errors. We demonstrate the validity and usage of model-robust and model-based adjusted tests. Extensive numerical studies have been conducted to demonstrate our theoretical findings and the favorable properties of the proposed adjustment methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic treatment regimes (DTRs) are sequences of functions that formalize the process of precision medicine. DTRs take as input patient information and output treatment recommendations. A major focus of the DTR literature has been on the estimation of optimal DTRs, the sequences of decision rules that result in the best outcome in expectation, across the complete population if they were to be applied. While there is a rich literature on optimal DTR estimation, to date, there has been minimal consideration of the impacts of nonadherence on these estimation procedures. Nonadherence refers to any process through which an individual's prescribed treatment does not match their true treatment. We explore the impacts of nonadherence and demonstrate that, generally, when nonadherence is ignored, suboptimal regimes will be estimated. In light of these findings, we propose a method for estimating optimal DTRs in the presence of nonadherence. The resulting estimators are consistent and asymptotically normal, with a double robustness property. Using simulations, we demonstrate the reliability of these results, and illustrate comparable performance between the proposed estimation procedure adjusting for the impacts of nonadherence and estimators that are computed on data without nonadherence.
{"title":"Optimal dynamic treatment regime estimation in the presence of nonadherence.","authors":"Dylan Spicker, Michael P Wallace, Grace Y Yi","doi":"10.1093/biomtc/ujaf041","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf041","url":null,"abstract":"<p><p>Dynamic treatment regimes (DTRs) are sequences of functions that formalize the process of precision medicine. DTRs take as input patient information and output treatment recommendations. A major focus of the DTR literature has been on the estimation of optimal DTRs, the sequences of decision rules that result in the best outcome in expectation, across the complete population if they were to be applied. While there is a rich literature on optimal DTR estimation, to date, there has been minimal consideration of the impacts of nonadherence on these estimation procedures. Nonadherence refers to any process through which an individual's prescribed treatment does not match their true treatment. We explore the impacts of nonadherence and demonstrate that, generally, when nonadherence is ignored, suboptimal regimes will be estimated. In light of these findings, we propose a method for estimating optimal DTRs in the presence of nonadherence. The resulting estimators are consistent and asymptotically normal, with a double robustness property. Using simulations, we demonstrate the reliability of these results, and illustrate comparable performance between the proposed estimation procedure adjusting for the impacts of nonadherence and estimators that are computed on data without nonadherence.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143953088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The distribution-free method of conformal prediction has gained considerable attention in computer science, machine learning, and statistics. Candès et al. extended this method to right-censored survival data, addressing right-censoring complexity by creating a covariate shift setting, extracting a subcohort of subjects with censoring times exceeding a fixed threshold. Their approach only estimates the lower prediction bound for type I censoring, where all subjects have available censoring times regardless of their failure status. In medical applications, we often encounter more general right-censored data, observing only the minimum of failure time and censoring time. Subjects with observed failure times have unavailable censoring times. To address this, we propose a bootstrap method to construct 1- as well as 2-sided conformal predictive intervals for general right-censored survival data under different working regression models. Through simulations, our method demonstrates excellent average coverage for the lower bound and good coverage for the 2-sided predictive interval, regardless of working model is correctly specified or not, particularly under moderate censoring. We further extend the proposed method to several directions in medical applications. We apply this method to predict breast cancer patients' future survival times based on tumor characteristics and treatment.
{"title":"Conformal predictive intervals in survival analysis: a resampling approach.","authors":"Jing Qin, Jin Piao, Jing Ning, Yu Shen","doi":"10.1093/biomtc/ujaf063","DOIUrl":"10.1093/biomtc/ujaf063","url":null,"abstract":"<p><p>The distribution-free method of conformal prediction has gained considerable attention in computer science, machine learning, and statistics. Candès et al. extended this method to right-censored survival data, addressing right-censoring complexity by creating a covariate shift setting, extracting a subcohort of subjects with censoring times exceeding a fixed threshold. Their approach only estimates the lower prediction bound for type I censoring, where all subjects have available censoring times regardless of their failure status. In medical applications, we often encounter more general right-censored data, observing only the minimum of failure time and censoring time. Subjects with observed failure times have unavailable censoring times. To address this, we propose a bootstrap method to construct 1- as well as 2-sided conformal predictive intervals for general right-censored survival data under different working regression models. Through simulations, our method demonstrates excellent average coverage for the lower bound and good coverage for the 2-sided predictive interval, regardless of working model is correctly specified or not, particularly under moderate censoring. We further extend the proposed method to several directions in medical applications. We apply this method to predict breast cancer patients' future survival times based on tumor characteristics and treatment.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muxuan Liang, Yingqi Zhao, Daniel W Lin, Matthew Cooperberg, Yingye Zheng
Active surveillance (AS) using repeated biopsies to monitor disease progression has been a popular alternative to immediate surgical intervention in cancer care. However, a biopsy procedure is invasive and sometimes leads to severe side effects of infection and bleeding. To reduce the burden of repeated surveillance biopsies, biomarker-assistant decision rules are sought to replace the fix-for-all regimen with tailored biopsy intensity for individual patients. Constructing or evaluating such decision rules is challenging. The key AS outcome is often ascertained subject to interval censoring. Furthermore, patients will discontinue participation in the AS study once they receive a positive surveillance biopsy. Thus, patient dropout is affected by the outcomes of these biopsies. This work proposes a non-parametric kernel-based method to estimate a tailored AS strategy's true positive rates (TPRs) and true negative rates (TNRs), accounting for interval censoring and immediate dropouts. We develop a weighted classification framework based on these estimates to estimate the optimally tailored AS strategy and further incorporate the cost-benefit ratio for cost-effectiveness in medical decision-making. Theoretically, we provide a uniform generalization error bound of the derived AS strategy, accommodating all possible trade-offs between TPRs and TNRs. Simulation and application to a prostate cancer surveillance study show the superiority of the proposed method.
{"title":"Estimating optimally tailored active surveillance strategy under interval censoring.","authors":"Muxuan Liang, Yingqi Zhao, Daniel W Lin, Matthew Cooperberg, Yingye Zheng","doi":"10.1093/biomtc/ujaf067","DOIUrl":"10.1093/biomtc/ujaf067","url":null,"abstract":"<p><p>Active surveillance (AS) using repeated biopsies to monitor disease progression has been a popular alternative to immediate surgical intervention in cancer care. However, a biopsy procedure is invasive and sometimes leads to severe side effects of infection and bleeding. To reduce the burden of repeated surveillance biopsies, biomarker-assistant decision rules are sought to replace the fix-for-all regimen with tailored biopsy intensity for individual patients. Constructing or evaluating such decision rules is challenging. The key AS outcome is often ascertained subject to interval censoring. Furthermore, patients will discontinue participation in the AS study once they receive a positive surveillance biopsy. Thus, patient dropout is affected by the outcomes of these biopsies. This work proposes a non-parametric kernel-based method to estimate a tailored AS strategy's true positive rates (TPRs) and true negative rates (TNRs), accounting for interval censoring and immediate dropouts. We develop a weighted classification framework based on these estimates to estimate the optimally tailored AS strategy and further incorporate the cost-benefit ratio for cost-effectiveness in medical decision-making. Theoretically, we provide a uniform generalization error bound of the derived AS strategy, accommodating all possible trade-offs between TPRs and TNRs. Simulation and application to a prostate cancer surveillance study show the superiority of the proposed method.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in causal inference have largely ignored continuous exposures, apart from model-based approaches, which face criticism due to potential model misspecification. Model-free approaches based on modified treatment policies, such as uniformly shifting each subject's observed exposure, have emerged as promising alternatives. However, because such interventions are impractical, it is necessary to evaluate a range of possible shifts to generate actionable insights. To address this, we introduce models that parameterize the effects of shift interventions across varying magnitudes, coupled with assumption-lean estimation strategies. To ensure validity and interpretability under model misspecification, we tailor these to minimize (squared) bias in estimating the effects of realistic shifts. We employ debiased machine learning procedures for this but observe them to exhibit erratic behavior under certain data-generating mechanisms, prompting two key innovations. First, we propose a broadly applicable debiasing procedure that yields estimators with significantly improved finite-sample properties and is of independent methodological interest. Second, we develop debiased machine learning estimators for estimands with a more favorable efficiency bound, but more nuanced interpretation when models are misspecified. Unlike existing projection estimators, our methods avoid inverse exposure density weighting and do not demand tailored shift interventions to address positivity violations. Extensive simulations and a re-analysis of the Bangladesh Wash Benefits study demonstrate the effectiveness, stability, and utility of our approach. This work advances assumption-lean methods that balance validity, interpretability, and efficiency.
{"title":"Towards efficient and interpretable assumption-lean generalized linear modeling of continuous exposure effects.","authors":"Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf071","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf071","url":null,"abstract":"<p><p>Advances in causal inference have largely ignored continuous exposures, apart from model-based approaches, which face criticism due to potential model misspecification. Model-free approaches based on modified treatment policies, such as uniformly shifting each subject's observed exposure, have emerged as promising alternatives. However, because such interventions are impractical, it is necessary to evaluate a range of possible shifts to generate actionable insights. To address this, we introduce models that parameterize the effects of shift interventions across varying magnitudes, coupled with assumption-lean estimation strategies. To ensure validity and interpretability under model misspecification, we tailor these to minimize (squared) bias in estimating the effects of realistic shifts. We employ debiased machine learning procedures for this but observe them to exhibit erratic behavior under certain data-generating mechanisms, prompting two key innovations. First, we propose a broadly applicable debiasing procedure that yields estimators with significantly improved finite-sample properties and is of independent methodological interest. Second, we develop debiased machine learning estimators for estimands with a more favorable efficiency bound, but more nuanced interpretation when models are misspecified. Unlike existing projection estimators, our methods avoid inverse exposure density weighting and do not demand tailored shift interventions to address positivity violations. Extensive simulations and a re-analysis of the Bangladesh Wash Benefits study demonstrate the effectiveness, stability, and utility of our approach. This work advances assumption-lean methods that balance validity, interpretability, and efficiency.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Covariate-dependent graph learning has gained increasing interest in the graphical modeling literature for the analysis of heterogeneous data. This task, however, poses challenges to modeling, computational efficiency, and interpretability. The parameter of interest can be naturally represented as a 3-dimensional array with elements that can be grouped according to 2 directions, corresponding to node level and covariate level, respectively. In this article, we propose a novel dual group spike-and-slab prior that enables multi-level selection at covariate-level and node-level, as well as individual (local) level sparsity. We introduce a nested strategy with specific choices to address distinct challenges posed by the various grouping directions. For posterior inference, we develop a full Gibbs sampler for all parameters, which mitigates the difficulties of parameter tuning often encountered in high-dimensional graphical models and facilitates routine implementation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of graph recovery. We show the practical utility of our model via an application to microbiome data where we seek to better understand the interactions among microbes as well as how these are affected by relevant covariates.
{"title":"Bayesian covariate-dependent graph learning with a dual group spike-and-slab prior.","authors":"Zijian Zeng, Meng Li, Marina Vannucci","doi":"10.1093/biomtc/ujaf053","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf053","url":null,"abstract":"<p><p>Covariate-dependent graph learning has gained increasing interest in the graphical modeling literature for the analysis of heterogeneous data. This task, however, poses challenges to modeling, computational efficiency, and interpretability. The parameter of interest can be naturally represented as a 3-dimensional array with elements that can be grouped according to 2 directions, corresponding to node level and covariate level, respectively. In this article, we propose a novel dual group spike-and-slab prior that enables multi-level selection at covariate-level and node-level, as well as individual (local) level sparsity. We introduce a nested strategy with specific choices to address distinct challenges posed by the various grouping directions. For posterior inference, we develop a full Gibbs sampler for all parameters, which mitigates the difficulties of parameter tuning often encountered in high-dimensional graphical models and facilitates routine implementation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of graph recovery. We show the practical utility of our model via an application to microbiome data where we seek to better understand the interactions among microbes as well as how these are affected by relevant covariates.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143962246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}