Ivair R Silva, Joselito Montalban, Fernando L P de Oliveira
Ideally, the sequential monitoring of adverse events following post-licensed drugs and vaccines is correctly adjusted for confounding variables, such as gender and age, that may have an effect on the quality of the events. This is the idea behind the usual fully randomized, the placebo-control, and the self-control designs. Two prominent methods for conducting sequential analysis of the safety of post-market drugs and vaccines are the maximized sequential probability ratio test (MaxSPRT), and its conditional version, the CMaxSPRT. However, even when the assumption of sample homogeneity is realistic prior to the drug/vaccine administration, the effects caused by the drugs and vaccines on the risk of an adverse event, if any, can still vary according to observable covariates. For binomial and Poisson data, a straightforward sequential test method is introduced in order to accommodate a regression structure in the MaxSPRT. The proposed sequential regression test is also applicable for the CMaxSPRT, that is, the regression works for comparing historical and surveillance Poisson data with unknown heterogeneous baseline rates, taking into account seasonality and any other observable confounding covariates. To illustrate the usefulness of such a regression method, we describe the potential applications of the method to monitor vaccine-adverse events in Manitoba, Canada. The numeric results and examples were executed with the R Sequential package.
{"title":"Maximized sequential probability ratio test regression.","authors":"Ivair R Silva, Joselito Montalban, Fernando L P de Oliveira","doi":"10.1093/biomtc/ujaf170","DOIUrl":"10.1093/biomtc/ujaf170","url":null,"abstract":"<p><p>Ideally, the sequential monitoring of adverse events following post-licensed drugs and vaccines is correctly adjusted for confounding variables, such as gender and age, that may have an effect on the quality of the events. This is the idea behind the usual fully randomized, the placebo-control, and the self-control designs. Two prominent methods for conducting sequential analysis of the safety of post-market drugs and vaccines are the maximized sequential probability ratio test (MaxSPRT), and its conditional version, the CMaxSPRT. However, even when the assumption of sample homogeneity is realistic prior to the drug/vaccine administration, the effects caused by the drugs and vaccines on the risk of an adverse event, if any, can still vary according to observable covariates. For binomial and Poisson data, a straightforward sequential test method is introduced in order to accommodate a regression structure in the MaxSPRT. The proposed sequential regression test is also applicable for the CMaxSPRT, that is, the regression works for comparing historical and surveillance Poisson data with unknown heterogeneous baseline rates, taking into account seasonality and any other observable confounding covariates. To illustrate the usefulness of such a regression method, we describe the potential applications of the method to monitor vaccine-adverse events in Manitoba, Canada. The numeric results and examples were executed with the R Sequential package.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12745959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adaptive randomization is a clinical trial design feature used to modify treatment allocation probabilities during accrual. In time-to-event trials, the impact of adaptive randomization is less well understood for estimating treatment efficacy in the presence of time-varying effects [e.g., relative risk of progression to acquired immunodeficiency syndrome (AIDS) or death changes over time]. Here, we focus on time-to-event trials where the scientific estimand is a marginal hazard ratio in the absence of intermittent censoring over the support of observed times. We analytically show that adaptive randomization alters censoring patterns and illustrate via Monte Carlo simulations that the Cox proportional hazards estimator can yield biased estimates. As a remedy, we propose a censoring-robust estimator based on reweighting the partial likelihood score by treatment-specific censoring distributions that account for adaptive randomization. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample operating characteristics via simulation. Finally, we apply our proposed method using data from the Community Programs for Clinical Research on AIDS Trial 002.
{"title":"Censoring-robust estimation in fixed sample time-to-event clinical trials with adaptive randomization.","authors":"Navneet R Hakhu, Daniel L Gillen","doi":"10.1093/biomtc/ujaf161","DOIUrl":"10.1093/biomtc/ujaf161","url":null,"abstract":"<p><p>Adaptive randomization is a clinical trial design feature used to modify treatment allocation probabilities during accrual. In time-to-event trials, the impact of adaptive randomization is less well understood for estimating treatment efficacy in the presence of time-varying effects [e.g., relative risk of progression to acquired immunodeficiency syndrome (AIDS) or death changes over time]. Here, we focus on time-to-event trials where the scientific estimand is a marginal hazard ratio in the absence of intermittent censoring over the support of observed times. We analytically show that adaptive randomization alters censoring patterns and illustrate via Monte Carlo simulations that the Cox proportional hazards estimator can yield biased estimates. As a remedy, we propose a censoring-robust estimator based on reweighting the partial likelihood score by treatment-specific censoring distributions that account for adaptive randomization. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample operating characteristics via simulation. Finally, we apply our proposed method using data from the Community Programs for Clinical Research on AIDS Trial 002.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.
我们对Yang等人最近发表的关于分段回归中模型估计和假设检验的论文提供了一些评论。
{"title":"Letter to the Editors: Comments on \"Statistical inference on change points in generalized semiparametric segmented models\" by Yang et al. (2025).","authors":"Vito M R Muggeo","doi":"10.1093/biomtc/ujaf147","DOIUrl":"10.1093/biomtc/ujaf147","url":null,"abstract":"<p><p>We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145666830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.
{"title":"Joint Bayesian additive regression trees for multiple nonlinear dependency networks.","authors":"Licai Huang, Christine B Peterson, Min Jin Ha","doi":"10.1093/biomtc/ujaf158","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf158","url":null,"abstract":"<p><p>Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145740907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt
Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.
{"title":"Variable importance measures for heterogeneous treatment effects.","authors":"Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf140","DOIUrl":"10.1093/biomtc/ujaf140","url":null,"abstract":"<p><p>Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7618827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.
{"title":"Inverse-intensity weighted generalized estimating equations for longitudinal data subject to irregular observation: which variables should be included in the visit rate model?","authors":"Eleanor M Pullenayegum, Di Shan","doi":"10.1093/biomtc/ujaf128","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf128","url":null,"abstract":"<p><p>Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu
Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.
{"title":"Adaptive stratified sampling design in two-phase studies for average causal effect estimation.","authors":"Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu","doi":"10.1093/biomtc/ujaf143","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf143","url":null,"abstract":"<p><p>Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu
Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.
{"title":"Federated double machine learning for high-dimensional semiparametric models.","authors":"Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu","doi":"10.1093/biomtc/ujaf150","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf150","url":null,"abstract":"<p><p>Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.
{"title":"Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses.","authors":"Jeffrey Zhang, Siyu Heng","doi":"10.1093/biomtc/ujaf156","DOIUrl":"10.1093/biomtc/ujaf156","url":null,"abstract":"<p><p>Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.
{"title":"Estimating heterogeneous treatment effects for general responses.","authors":"Zijun Gao, Trevor Hastie","doi":"10.1093/biomtc/ujaf162","DOIUrl":"10.1093/biomtc/ujaf162","url":null,"abstract":"<p><p>Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12728347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}