The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distribution for Bayesian inference on Cox models, which extends a general class of prior distributions originally designed to stabilize complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients derived from synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset.
{"title":"Bayesian inference for Cox regression models using catalytic prior distributions.","authors":"Weihao Li, Dongming Huang","doi":"10.1093/biomtc/ujag004","DOIUrl":"https://doi.org/10.1093/biomtc/ujag004","url":null,"abstract":"<p><p>The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distribution for Bayesian inference on Cox models, which extends a general class of prior distributions originally designed to stabilize complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients derived from synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the present study, we examine long-term population-level effects on episodic memory of an intervention over 15 years that reduces systolic blood pressure in individuals with hypertension. A limitation with previous research on the potential risk reduction of such interventions is that they do not properly account for the reduction of mortality rates. Hence, one can only speculate whether the effect is due to changes in memory or changes in mortality. Therefore, we extend previous research by providing both an etiological and a prognostic effect estimate. To do this, we propose a Bayesian semi-parametric estimation approach for an incremental threshold intervention, using the extended G-formula. Additionally, we introduce a novel sparsity-inducing Dirichlet prior for longitudinal data, that exploits the longitudinal structure of the data. We demonstrate the usefulness of our approach in simulations, and compare its performance to other Bayesian decision tree ensemble approaches. In our analysis of the data from the Betula cohort, we found no significant prognostic or etiological effects across all ages. This suggests that systolic blood pressure interventions likely do not strongly affect memory, either at the overall population level or among individuals who would remain alive under both the natural course and the intervention (the always survivor stratum).
{"title":"Long-term memory effects of an incremental blood pressure intervention in a mortal cohort.","authors":"Maria Josefsson, Nina Karalija, Michael J Daniels","doi":"10.1093/biomtc/ujaf176","DOIUrl":"10.1093/biomtc/ujaf176","url":null,"abstract":"<p><p>In the present study, we examine long-term population-level effects on episodic memory of an intervention over 15 years that reduces systolic blood pressure in individuals with hypertension. A limitation with previous research on the potential risk reduction of such interventions is that they do not properly account for the reduction of mortality rates. Hence, one can only speculate whether the effect is due to changes in memory or changes in mortality. Therefore, we extend previous research by providing both an etiological and a prognostic effect estimate. To do this, we propose a Bayesian semi-parametric estimation approach for an incremental threshold intervention, using the extended G-formula. Additionally, we introduce a novel sparsity-inducing Dirichlet prior for longitudinal data, that exploits the longitudinal structure of the data. We demonstrate the usefulness of our approach in simulations, and compare its performance to other Bayesian decision tree ensemble approaches. In our analysis of the data from the Betula cohort, we found no significant prognostic or etiological effects across all ages. This suggests that systolic blood pressure interventions likely do not strongly affect memory, either at the overall population level or among individuals who would remain alive under both the natural course and the intervention (the always survivor stratum).</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We thank Dr. Kook for his thoughtful and constructive discussion of our paper. We appreciate his careful examination of the assumptions underlying our proposed method, as well as the numerical comparison with the Transformation Model Generalised Covariance Measure test proposed by Kook et al. Below, we respond to the main points raised in the Comment.
{"title":"Rejoinder to reader reaction \"Comment on 'Double robust conditional independence test for novel biomarkers given established risk factors with survival data' by Lucas Kook\".","authors":"Baoying Yang, Jing Qin, Jing Ning, Yukun Liu","doi":"10.1093/biomtc/ujag038","DOIUrl":"10.1093/biomtc/ujag038","url":null,"abstract":"<p><p>We thank Dr. Kook for his thoughtful and constructive discussion of our paper. We appreciate his careful examination of the assumptions underlying our proposed method, as well as the numerical comparison with the Transformation Model Generalised Covariance Measure test proposed by Kook et al. Below, we respond to the main points raised in the Comment.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147301706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damianos Michaelides, Maria Adamou, David C Woods, Antony M Overstall
A Bayesian optimal experimental design framework is developed for experiments where settings of one or more variables, referred to as profile variables, can be functions. For this type of experiment, a design consists of combinations of functions for each run of the experiment. Within a scalar-on-function linear model, profile variables are represented through basis expansions. This allows finite-dimensional representation of the profile variables and optimal designs to be found. The approach enables control over the complexity of the profile variables and model. The method is illustrated on a real application involving dynamic feeding strategies in an Ambr250 modular bioreactor system.
{"title":"Optimal design of dynamic experiments for scalar-on-function linear models with application to a biopharmaceutical study.","authors":"Damianos Michaelides, Maria Adamou, David C Woods, Antony M Overstall","doi":"10.1093/biomtc/ujaf169","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf169","url":null,"abstract":"<p><p>A Bayesian optimal experimental design framework is developed for experiments where settings of one or more variables, referred to as profile variables, can be functions. For this type of experiment, a design consists of combinations of functions for each run of the experiment. Within a scalar-on-function linear model, profile variables are represented through basis expansions. This allows finite-dimensional representation of the profile variables and optimal designs to be found. The approach enables control over the complexity of the profile variables and model. The method is illustrated on a real application involving dynamic feeding strategies in an Ambr250 modular bioreactor system.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The treatment assignment mechanism in a randomized clinical trial can be optimized for statistical efficiency within a specified class of randomization mechanisms. Optimal designs of this type have been characterized in terms of the variances of potential outcomes conditional on baseline covariates. Approximating these optimal designs requires information about the conditional variance functions, which is often unavailable or unreliable at the design stage. As a practical solution to this dilemma, we propose a multi-stage adaptive design that allows the treatment assignment mechanism to be modified at interim analyses based on accruing information about the conditional variance functions. This adaptation has profound implications on the distribution of trial data, which need to be accounted for in treatment effect estimation. We consider a class of treatment effect estimators that are consistent and asymptotically normal, identify the most efficient estimator within this class, and approximate the most efficient estimator by substituting estimates of unknown quantities. Simulation results indicate that, when there is little or no prior information available, the proposed design can bring substantial efficiency gains over conventional one-stage designs based on the same prior information. The methodology is illustrated with real data from a completed trial in stroke.
{"title":"An adaptive design for optimizing treatment assignment in randomized clinical trials.","authors":"Wei Zhang, Zhiwei Zhang, Aiyi Liu","doi":"10.1093/biomtc/ujaf168","DOIUrl":"10.1093/biomtc/ujaf168","url":null,"abstract":"<p><p>The treatment assignment mechanism in a randomized clinical trial can be optimized for statistical efficiency within a specified class of randomization mechanisms. Optimal designs of this type have been characterized in terms of the variances of potential outcomes conditional on baseline covariates. Approximating these optimal designs requires information about the conditional variance functions, which is often unavailable or unreliable at the design stage. As a practical solution to this dilemma, we propose a multi-stage adaptive design that allows the treatment assignment mechanism to be modified at interim analyses based on accruing information about the conditional variance functions. This adaptation has profound implications on the distribution of trial data, which need to be accounted for in treatment effect estimation. We consider a class of treatment effect estimators that are consistent and asymptotically normal, identify the most efficient estimator within this class, and approximate the most efficient estimator by substituting estimates of unknown quantities. Simulation results indicate that, when there is little or no prior information available, the proposed design can bring substantial efficiency gains over conventional one-stage designs based on the same prior information. The methodology is illustrated with real data from a completed trial in stroke.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145916646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies have shown associations between redlining policies (1935-1974) and present-day fine particulate matter (PM$_{2.5}$) and nitrogen dioxide (NO$_2$) air pollution concentrations. In this paper, we move beyond associations and investigate the causal effects of redlining using spatial causal inference. Redlining policies were enacted in the 1930s, so there is very limited documentation of pre-treatment covariates. Consequently, traditional methods failed to sufficiently account for unmeasured confounders, potentially biasing causal interpretations. By integrating historical redlining data with 2010 PM$_{2.5}$ and NO$_2$ concentrations, our study seeks to estimate the long-term causal impact. Our study addresses challenges with a novel spatial and non-spatial latent factor framework, using the unemployment rate, house rent and percentage of Black population in 1940 US Census as proxies to reconstruct pre-treatment latent socio-economic status. We establish identification of a causal effect under broad assumptions, and use Bayesian Markov Chain Monte Carlo to quantify uncertainty. Our causal analysis provides evidence that historically redlined neighborhoods are exposed to notably higher NO$_2$ concentration. In contrast, the disparities in PM$_{2.5}$ between these neighborhoods are less pronounced. Among the cities analyzed, Los Angeles, CA, and Atlanta, GA, demonstrate the most significant effects for both NO$_2$ and PM$_{2.5}$.
{"title":"Estimating the causal effect of redlining on present-day air pollution.","authors":"Xiaodan Zhou, Shu Yang, Brian J Reich","doi":"10.1093/biomtc/ujaf173","DOIUrl":"10.1093/biomtc/ujaf173","url":null,"abstract":"<p><p>Recent studies have shown associations between redlining policies (1935-1974) and present-day fine particulate matter (PM$_{2.5}$) and nitrogen dioxide (NO$_2$) air pollution concentrations. In this paper, we move beyond associations and investigate the causal effects of redlining using spatial causal inference. Redlining policies were enacted in the 1930s, so there is very limited documentation of pre-treatment covariates. Consequently, traditional methods failed to sufficiently account for unmeasured confounders, potentially biasing causal interpretations. By integrating historical redlining data with 2010 PM$_{2.5}$ and NO$_2$ concentrations, our study seeks to estimate the long-term causal impact. Our study addresses challenges with a novel spatial and non-spatial latent factor framework, using the unemployment rate, house rent and percentage of Black population in 1940 US Census as proxies to reconstruct pre-treatment latent socio-economic status. We establish identification of a causal effect under broad assumptions, and use Bayesian Markov Chain Monte Carlo to quantify uncertainty. Our causal analysis provides evidence that historically redlined neighborhoods are exposed to notably higher NO$_2$ concentration. In contrast, the disparities in PM$_{2.5}$ between these neighborhoods are less pronounced. Among the cities analyzed, Los Angeles, CA, and Atlanta, GA, demonstrate the most significant effects for both NO$_2$ and PM$_{2.5}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blair Robertson, Chris Price, Marco Reale, Philip Davies
A spatial sampling design determines where sample locations are placed in a study area to achieve precise estimates of population parameters. Many environmental variables have positive spatial associations, and spatially balanced designs perform well. The recently published dynamic assignment sampling (DAS) design draws spatially balanced master or over-samples in auxiliary spaces. This article proposes a new objective function for DAS to draw doubly balanced master or over-samples, where two balancing properties are satisfied: approximately balanced on auxiliary variables and spatially balanced. All we require is a measure of the distance between population units. Numerical results show that the method generates spatially balanced, balanced, or doubly balanced master or over-samples and compares favorably with established fixed sample size designs. We provide an example application using total aboveground biomass over a large study area in Eastern Amazonia, Brazil, and design-based variance estimators.
{"title":"Doubly balanced samples with dynamic sample sizes.","authors":"Blair Robertson, Chris Price, Marco Reale, Philip Davies","doi":"10.1093/biomtc/ujag011","DOIUrl":"https://doi.org/10.1093/biomtc/ujag011","url":null,"abstract":"<p><p>A spatial sampling design determines where sample locations are placed in a study area to achieve precise estimates of population parameters. Many environmental variables have positive spatial associations, and spatially balanced designs perform well. The recently published dynamic assignment sampling (DAS) design draws spatially balanced master or over-samples in auxiliary spaces. This article proposes a new objective function for DAS to draw doubly balanced master or over-samples, where two balancing properties are satisfied: approximately balanced on auxiliary variables and spatially balanced. All we require is a measure of the distance between population units. Numerical results show that the method generates spatially balanced, balanced, or doubly balanced master or over-samples and compares favorably with established fixed sample size designs. We provide an example application using total aboveground biomass over a large study area in Eastern Amazonia, Brazil, and design-based variance estimators.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146155947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the causal effect of a treatment or health policy with observational data can be challenging due to an imbalance of and a lack of overlap between treated and control covariate distributions. In the presence of limited overlap, researchers choose between (1) methods (e.g., inverse probability weighting) that imply traditional estimands but whose estimators are at risk of considerable bias and variance; and (2) methods (e.g., overlap weighting) which imply a different estimand by modifying the target population to reduce variance. We propose a framework for navigating the tradeoffs between variance and bias due to imbalance and a lack of overlap and the targeting of the estimand of scientific interest. We introduce a bias decomposition that encapsulates bias due to (1) the statistical bias of the estimator; and (2) estimand mismatch, i.e., deviation from the population of interest. We propose two design-based metrics and an estimand selection procedure that help illustrate the tradeoffs between these sources of bias and variance of the resulting estimators. Our procedure allows analysts to incorporate their domain-specific preference for preservation of the original research population versus reduction of statistical bias. We demonstrate how to select an estimand based on these preferences with an application to right heart catheterization data.
{"title":"A framework for causal estimand selection under positivity violations.","authors":"Martha Barnard, Jared D Huling, Julian Wolfson","doi":"10.1093/biomtc/ujag014","DOIUrl":"https://doi.org/10.1093/biomtc/ujag014","url":null,"abstract":"<p><p>Estimating the causal effect of a treatment or health policy with observational data can be challenging due to an imbalance of and a lack of overlap between treated and control covariate distributions. In the presence of limited overlap, researchers choose between (1) methods (e.g., inverse probability weighting) that imply traditional estimands but whose estimators are at risk of considerable bias and variance; and (2) methods (e.g., overlap weighting) which imply a different estimand by modifying the target population to reduce variance. We propose a framework for navigating the tradeoffs between variance and bias due to imbalance and a lack of overlap and the targeting of the estimand of scientific interest. We introduce a bias decomposition that encapsulates bias due to (1) the statistical bias of the estimator; and (2) estimand mismatch, i.e., deviation from the population of interest. We propose two design-based metrics and an estimand selection procedure that help illustrate the tradeoffs between these sources of bias and variance of the resulting estimators. Our procedure allows analysts to incorporate their domain-specific preference for preservation of the original research population versus reduction of statistical bias. We demonstrate how to select an estimand based on these preferences with an application to right heart catheterization data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146155967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a general framework for estimating fixed and random effects in ordinary differential equation (ODE) models that are linear in the parameters, accommodating single-level, nested hierarchical, and crossed random-effect structures. The method-Direct Integral Mixed-Effects (DIME)-exploits the separability of parameters and states to reformulate the problem within a linear mixed-effects model framework. This enables the use of standard inference tools, including confidence intervals and model selection. We provide theoretical guarantees of consistency and asymptotic normality. By bridging nonlinear dynamics and linear mixed-effects model methodology, DIME extends the scope of mixed-effects ODE modeling to complex hierarchical data structures, enhancing accessibility of statistical inference for a broad class of dynamical systems. Monte Carlo simulations compare DIME to established nonlinear mixed-effects approaches implemented in nlme and nlmixr2 R packages. Across varied sample sizes, noise levels, and random-effect structures, DIME exhibits competitive bias, RMSE, and superior coverage probabilities in many scenarios, particularly for variance component estimation with limited data, and remains applicable when competing methods cannot be used. Applications to real-world datasets demonstrate the method's flexibility. For population growth data from 43 countries, DIME recovers exponential growth rates consistent with demographic studies. In modeling joint dynamics of atmospheric pressure and wind speed for 55 U.S. cities, it identifies an oscillatory relationship and supports hierarchical nesting of city effects within periods, outperforming alternative random-effect configurations.
{"title":"Estimation of mixed-effects ordinary differential equation models linear in the parameters.","authors":"Oleksandr Laskorunskyi, Snigdhansu Chatterjee, Itai Dattner","doi":"10.1093/biomtc/ujag016","DOIUrl":"10.1093/biomtc/ujag016","url":null,"abstract":"<p><p>We propose a general framework for estimating fixed and random effects in ordinary differential equation (ODE) models that are linear in the parameters, accommodating single-level, nested hierarchical, and crossed random-effect structures. The method-Direct Integral Mixed-Effects (DIME)-exploits the separability of parameters and states to reformulate the problem within a linear mixed-effects model framework. This enables the use of standard inference tools, including confidence intervals and model selection. We provide theoretical guarantees of consistency and asymptotic normality. By bridging nonlinear dynamics and linear mixed-effects model methodology, DIME extends the scope of mixed-effects ODE modeling to complex hierarchical data structures, enhancing accessibility of statistical inference for a broad class of dynamical systems. Monte Carlo simulations compare DIME to established nonlinear mixed-effects approaches implemented in nlme and nlmixr2 R packages. Across varied sample sizes, noise levels, and random-effect structures, DIME exhibits competitive bias, RMSE, and superior coverage probabilities in many scenarios, particularly for variance component estimation with limited data, and remains applicable when competing methods cannot be used. Applications to real-world datasets demonstrate the method's flexibility. For population growth data from 43 countries, DIME recovers exponential growth rates consistent with demographic studies. In modeling joint dynamics of atmospheric pressure and wind speed for 55 U.S. cities, it identifies an oscillatory relationship and supports hierarchical nesting of city effects within periods, outperforming alternative random-effect configurations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147321179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In matched observational studies with continuous treatments, individuals with different treatment doses but the same or similar covariate values are paired for causal inference. While inexact covariate matching (i.e., covariate imbalance after matching) is common in practice, previous matched studies with continuous treatments have often overlooked this issue as long as post-matching covariate balance meets certain criteria. Through re-analyzing a matched observational study on the effect of social distancing on COVID-19 case counts, we show that this routine practice can introduce severe bias for causal inference. Motivated by this finding, we propose a general framework for mitigating bias due to inexact matching in matched observational studies with continuous treatments, covering the matching, estimation, and inference stages. In the matching stage, we propose a carefully designed caliper that incorporates both covariate and treatment dose information to improve matching for downstream treatment effect estimation and inference. For the estimation and inference, we introduce a bias-corrected Neyman estimator paired with a corresponding bias-corrected variance estimator. The effectiveness of our proposed framework is demonstrated through numerical studies and a re-analysis of the aforementioned observational study on the effect of social distancing on COVID-19 case counts. An open-source $tt {R}$ package for implementing our framework has also been developed.
{"title":"Bias mitigation in matched observational studies with continuous treatments: calipered non-bipartite matching and bias-corrected estimation and inference.","authors":"Anthony Frazier, Siyu Heng, Wen Zhou","doi":"10.1093/biomtc/ujag022","DOIUrl":"https://doi.org/10.1093/biomtc/ujag022","url":null,"abstract":"<p><p>In matched observational studies with continuous treatments, individuals with different treatment doses but the same or similar covariate values are paired for causal inference. While inexact covariate matching (i.e., covariate imbalance after matching) is common in practice, previous matched studies with continuous treatments have often overlooked this issue as long as post-matching covariate balance meets certain criteria. Through re-analyzing a matched observational study on the effect of social distancing on COVID-19 case counts, we show that this routine practice can introduce severe bias for causal inference. Motivated by this finding, we propose a general framework for mitigating bias due to inexact matching in matched observational studies with continuous treatments, covering the matching, estimation, and inference stages. In the matching stage, we propose a carefully designed caliper that incorporates both covariate and treatment dose information to improve matching for downstream treatment effect estimation and inference. For the estimation and inference, we introduce a bias-corrected Neyman estimator paired with a corresponding bias-corrected variance estimator. The effectiveness of our proposed framework is demonstrated through numerical studies and a re-analysis of the aforementioned observational study on the effect of social distancing on COVID-19 case counts. An open-source $tt {R}$ package for implementing our framework has also been developed.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147282126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}