The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distribution for Bayesian inference on Cox models, which extends a general class of prior distributions originally designed to stabilize complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients derived from synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset.
{"title":"Bayesian inference for Cox regression models using catalytic prior distributions.","authors":"Weihao Li, Dongming Huang","doi":"10.1093/biomtc/ujag004","DOIUrl":"https://doi.org/10.1093/biomtc/ujag004","url":null,"abstract":"<p><p>The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distribution for Bayesian inference on Cox models, which extends a general class of prior distributions originally designed to stabilize complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients derived from synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the present study, we examine long-term population-level effects on episodic memory of an intervention over 15 years that reduces systolic blood pressure in individuals with hypertension. A limitation with previous research on the potential risk reduction of such interventions is that they do not properly account for the reduction of mortality rates. Hence, one can only speculate whether the effect is due to changes in memory or changes in mortality. Therefore, we extend previous research by providing both an etiological and a prognostic effect estimate. To do this, we propose a Bayesian semi-parametric estimation approach for an incremental threshold intervention, using the extended G-formula. Additionally, we introduce a novel sparsity-inducing Dirichlet prior for longitudinal data, that exploits the longitudinal structure of the data. We demonstrate the usefulness of our approach in simulations, and compare its performance to other Bayesian decision tree ensemble approaches. In our analysis of the data from the Betula cohort, we found no significant prognostic or etiological effects across all ages. This suggests that systolic blood pressure interventions likely do not strongly affect memory, either at the overall population level or among individuals who would remain alive under both the natural course and the intervention (the always survivor stratum).
{"title":"Long-term memory effects of an incremental blood pressure intervention in a mortal cohort.","authors":"Maria Josefsson, Nina Karalija, Michael J Daniels","doi":"10.1093/biomtc/ujaf176","DOIUrl":"10.1093/biomtc/ujaf176","url":null,"abstract":"<p><p>In the present study, we examine long-term population-level effects on episodic memory of an intervention over 15 years that reduces systolic blood pressure in individuals with hypertension. A limitation with previous research on the potential risk reduction of such interventions is that they do not properly account for the reduction of mortality rates. Hence, one can only speculate whether the effect is due to changes in memory or changes in mortality. Therefore, we extend previous research by providing both an etiological and a prognostic effect estimate. To do this, we propose a Bayesian semi-parametric estimation approach for an incremental threshold intervention, using the extended G-formula. Additionally, we introduce a novel sparsity-inducing Dirichlet prior for longitudinal data, that exploits the longitudinal structure of the data. We demonstrate the usefulness of our approach in simulations, and compare its performance to other Bayesian decision tree ensemble approaches. In our analysis of the data from the Betula cohort, we found no significant prognostic or etiological effects across all ages. This suggests that systolic blood pressure interventions likely do not strongly affect memory, either at the overall population level or among individuals who would remain alive under both the natural course and the intervention (the always survivor stratum).</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damianos Michaelides, Maria Adamou, David C Woods, Antony M Overstall
A Bayesian optimal experimental design framework is developed for experiments where settings of one or more variables, referred to as profile variables, can be functions. For this type of experiment, a design consists of combinations of functions for each run of the experiment. Within a scalar-on-function linear model, profile variables are represented through basis expansions. This allows finite-dimensional representation of the profile variables and optimal designs to be found. The approach enables control over the complexity of the profile variables and model. The method is illustrated on a real application involving dynamic feeding strategies in an Ambr250 modular bioreactor system.
{"title":"Optimal design of dynamic experiments for scalar-on-function linear models with application to a biopharmaceutical study.","authors":"Damianos Michaelides, Maria Adamou, David C Woods, Antony M Overstall","doi":"10.1093/biomtc/ujaf169","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf169","url":null,"abstract":"<p><p>A Bayesian optimal experimental design framework is developed for experiments where settings of one or more variables, referred to as profile variables, can be functions. For this type of experiment, a design consists of combinations of functions for each run of the experiment. Within a scalar-on-function linear model, profile variables are represented through basis expansions. This allows finite-dimensional representation of the profile variables and optimal designs to be found. The approach enables control over the complexity of the profile variables and model. The method is illustrated on a real application involving dynamic feeding strategies in an Ambr250 modular bioreactor system.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The treatment assignment mechanism in a randomized clinical trial can be optimized for statistical efficiency within a specified class of randomization mechanisms. Optimal designs of this type have been characterized in terms of the variances of potential outcomes conditional on baseline covariates. Approximating these optimal designs requires information about the conditional variance functions, which is often unavailable or unreliable at the design stage. As a practical solution to this dilemma, we propose a multi-stage adaptive design that allows the treatment assignment mechanism to be modified at interim analyses based on accruing information about the conditional variance functions. This adaptation has profound implications on the distribution of trial data, which need to be accounted for in treatment effect estimation. We consider a class of treatment effect estimators that are consistent and asymptotically normal, identify the most efficient estimator within this class, and approximate the most efficient estimator by substituting estimates of unknown quantities. Simulation results indicate that, when there is little or no prior information available, the proposed design can bring substantial efficiency gains over conventional one-stage designs based on the same prior information. The methodology is illustrated with real data from a completed trial in stroke.
{"title":"An adaptive design for optimizing treatment assignment in randomized clinical trials.","authors":"Wei Zhang, Zhiwei Zhang, Aiyi Liu","doi":"10.1093/biomtc/ujaf168","DOIUrl":"10.1093/biomtc/ujaf168","url":null,"abstract":"<p><p>The treatment assignment mechanism in a randomized clinical trial can be optimized for statistical efficiency within a specified class of randomization mechanisms. Optimal designs of this type have been characterized in terms of the variances of potential outcomes conditional on baseline covariates. Approximating these optimal designs requires information about the conditional variance functions, which is often unavailable or unreliable at the design stage. As a practical solution to this dilemma, we propose a multi-stage adaptive design that allows the treatment assignment mechanism to be modified at interim analyses based on accruing information about the conditional variance functions. This adaptation has profound implications on the distribution of trial data, which need to be accounted for in treatment effect estimation. We consider a class of treatment effect estimators that are consistent and asymptotically normal, identify the most efficient estimator within this class, and approximate the most efficient estimator by substituting estimates of unknown quantities. Simulation results indicate that, when there is little or no prior information available, the proposed design can bring substantial efficiency gains over conventional one-stage designs based on the same prior information. The methodology is illustrated with real data from a completed trial in stroke.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145916646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies have shown associations between redlining policies (1935-1974) and present-day fine particulate matter (PM$_{2.5}$) and nitrogen dioxide (NO$_2$) air pollution concentrations. In this paper, we move beyond associations and investigate the causal effects of redlining using spatial causal inference. Redlining policies were enacted in the 1930s, so there is very limited documentation of pre-treatment covariates. Consequently, traditional methods failed to sufficiently account for unmeasured confounders, potentially biasing causal interpretations. By integrating historical redlining data with 2010 PM$_{2.5}$ and NO$_2$ concentrations, our study seeks to estimate the long-term causal impact. Our study addresses challenges with a novel spatial and non-spatial latent factor framework, using the unemployment rate, house rent and percentage of Black population in 1940 US Census as proxies to reconstruct pre-treatment latent socio-economic status. We establish identification of a causal effect under broad assumptions, and use Bayesian Markov Chain Monte Carlo to quantify uncertainty. Our causal analysis provides evidence that historically redlined neighborhoods are exposed to notably higher NO$_2$ concentration. In contrast, the disparities in PM$_{2.5}$ between these neighborhoods are less pronounced. Among the cities analyzed, Los Angeles, CA, and Atlanta, GA, demonstrate the most significant effects for both NO$_2$ and PM$_{2.5}$.
{"title":"Estimating the causal effect of redlining on present-day air pollution.","authors":"Xiaodan Zhou, Shu Yang, Brian J Reich","doi":"10.1093/biomtc/ujaf173","DOIUrl":"10.1093/biomtc/ujaf173","url":null,"abstract":"<p><p>Recent studies have shown associations between redlining policies (1935-1974) and present-day fine particulate matter (PM$_{2.5}$) and nitrogen dioxide (NO$_2$) air pollution concentrations. In this paper, we move beyond associations and investigate the causal effects of redlining using spatial causal inference. Redlining policies were enacted in the 1930s, so there is very limited documentation of pre-treatment covariates. Consequently, traditional methods failed to sufficiently account for unmeasured confounders, potentially biasing causal interpretations. By integrating historical redlining data with 2010 PM$_{2.5}$ and NO$_2$ concentrations, our study seeks to estimate the long-term causal impact. Our study addresses challenges with a novel spatial and non-spatial latent factor framework, using the unemployment rate, house rent and percentage of Black population in 1940 US Census as proxies to reconstruct pre-treatment latent socio-economic status. We establish identification of a causal effect under broad assumptions, and use Bayesian Markov Chain Monte Carlo to quantify uncertainty. Our causal analysis provides evidence that historically redlined neighborhoods are exposed to notably higher NO$_2$ concentration. In contrast, the disparities in PM$_{2.5}$ between these neighborhoods are less pronounced. Among the cities analyzed, Los Angeles, CA, and Atlanta, GA, demonstrate the most significant effects for both NO$_2$ and PM$_{2.5}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effectiveness of immune-oncology chemotherapies has been presented in recent clinical trials. The Kaplan-Meier estimates of the survival functions of the immune therapy and the control often suggested the presence of the lag-time until the immune therapy began to act. It implies the use of hazard ratio under the proportional hazards assumption would not be appealing, and many alternatives have been investigated such as the restricted mean survival time. In addition to such overall summary of the treatment contrast, the lag-time is also an important feature of the treatment effect. Identical survival functions up to the lag-time implies patients who are likely to die before the lag-time would not benefit the treatment and identifying such patients would be very important. We propose the semiparametric piecewise accelerated failure time model and its inference procedure based on the semiparametric maximum likelihood method. It provides not only an overall treatment summary, but also a framework to identify patients who have less benefit from the immune-therapy in a unified way. Numerical experiments confirm that each parameter can be estimated with minimal bias. Through a real data analysis, we illustrate the evaluation of the effect of immune-oncology therapy and the characterization of covariates in which patients are unlikely to receive the benefit of treatment.
{"title":"Semiparametric piecewise accelerated failure time model for the analysis of immune-oncology clinical trials.","authors":"Hisato Sunami, Satoshi Hattori","doi":"10.1093/biomtc/ujaf171","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf171","url":null,"abstract":"<p><p>Effectiveness of immune-oncology chemotherapies has been presented in recent clinical trials. The Kaplan-Meier estimates of the survival functions of the immune therapy and the control often suggested the presence of the lag-time until the immune therapy began to act. It implies the use of hazard ratio under the proportional hazards assumption would not be appealing, and many alternatives have been investigated such as the restricted mean survival time. In addition to such overall summary of the treatment contrast, the lag-time is also an important feature of the treatment effect. Identical survival functions up to the lag-time implies patients who are likely to die before the lag-time would not benefit the treatment and identifying such patients would be very important. We propose the semiparametric piecewise accelerated failure time model and its inference procedure based on the semiparametric maximum likelihood method. It provides not only an overall treatment summary, but also a framework to identify patients who have less benefit from the immune-therapy in a unified way. Numerical experiments confirm that each parameter can be estimated with minimal bias. Through a real data analysis, we illustrate the evaluation of the effect of immune-oncology therapy and the characterization of covariates in which patients are unlikely to receive the benefit of treatment.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145916649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by the need for flexible and interpretable models to handle circular data, this paper introduces a semiparametric regression model for a circular response that can include both linear and circular covariates in its parametric and nonparametric components. Rather than imposing a particular parametric distribution on the error term, we adopt a circular quasi-likelihood function, which is useful when the underlying distribution is unknown. We discuss the asymptotic properties of the resulting estimators and a backfitting algorithm for model fitting. We evaluate the finite-sample performance of our proposal through simulations and illustrate its advantages for assessing the genetic effect on the migratory patterns of willow warblers. This offers new insights into how specific genomic elements can influence migratory behaviour.
{"title":"Quasi-likelihood estimation for semiparametric circular regression models.","authors":"Anna Gottard, Andrea Meilán-Vila, Agnese Panzera","doi":"10.1093/biomtc/ujag002","DOIUrl":"https://doi.org/10.1093/biomtc/ujag002","url":null,"abstract":"<p><p>Motivated by the need for flexible and interpretable models to handle circular data, this paper introduces a semiparametric regression model for a circular response that can include both linear and circular covariates in its parametric and nonparametric components. Rather than imposing a particular parametric distribution on the error term, we adopt a circular quasi-likelihood function, which is useful when the underlying distribution is unknown. We discuss the asymptotic properties of the resulting estimators and a backfitting algorithm for model fitting. We evaluate the finite-sample performance of our proposal through simulations and illustrate its advantages for assessing the genetic effect on the migratory patterns of willow warblers. This offers new insights into how specific genomic elements can influence migratory behaviour.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial phenomena in environmental and biological contexts often involve events that are unevenly distributed across space, carrying attributes whose associations/variations are space-dependent. In this paper, we introduce the class of inhomogeneous mark correlation functions, which capture mark associations/variations while explicitly accounting for spatial inhomogeneity. The proposed functions quantify how, on average, marks vary or associate with one another as a function of pairwise spatial distances. We develop nonparametric estimators and evaluate their performance through simulation studies, covering a range of scenarios with mark association or variation, spanning from nonstationary point patterns without spatial interaction to patterns with clustering tendencies and sparse regions. Our simulations reveal the shortcomings of traditional methods under spatial inhomogeneity, underscoring the necessity of our approach. The results show that our estimators accurately identify both the positivity/negativity and the effective spatial range for detected mark associations/variations. Furthermore, we show that differences in how intensity is estimated generally have only a negligible influence on the empirical bias/variance of our proposed inhomogeneous mark correlation functions. The proposed inhomogeneous mark correlation functions are then applied to two distinct forest ecosystems: Longleaf pine trees in southern Georgia, USA, marked by their diameter at breast height, and Scots pine trees in Pfynwald, Switzerland, marked by their height. Our findings reveal that the inhomogeneous mark correlation functions provide more detailed insights into tree growth patterns compared to traditional methods.
{"title":"Inhomogeneous mark correlation functions for general marked point processes.","authors":"Mehdi Moradi, Matthias Eckardt","doi":"10.1093/biomtc/ujaf177","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf177","url":null,"abstract":"<p><p>Spatial phenomena in environmental and biological contexts often involve events that are unevenly distributed across space, carrying attributes whose associations/variations are space-dependent. In this paper, we introduce the class of inhomogeneous mark correlation functions, which capture mark associations/variations while explicitly accounting for spatial inhomogeneity. The proposed functions quantify how, on average, marks vary or associate with one another as a function of pairwise spatial distances. We develop nonparametric estimators and evaluate their performance through simulation studies, covering a range of scenarios with mark association or variation, spanning from nonstationary point patterns without spatial interaction to patterns with clustering tendencies and sparse regions. Our simulations reveal the shortcomings of traditional methods under spatial inhomogeneity, underscoring the necessity of our approach. The results show that our estimators accurately identify both the positivity/negativity and the effective spatial range for detected mark associations/variations. Furthermore, we show that differences in how intensity is estimated generally have only a negligible influence on the empirical bias/variance of our proposed inhomogeneous mark correlation functions. The proposed inhomogeneous mark correlation functions are then applied to two distinct forest ecosystems: Longleaf pine trees in southern Georgia, USA, marked by their diameter at breast height, and Scots pine trees in Pfynwald, Switzerland, marked by their height. Our findings reveal that the inhomogeneous mark correlation functions provide more detailed insights into tree growth patterns compared to traditional methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146008434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Rodriguez Duque, David A Stephens, Erica E M Moodie
Identifying dynamic treatment regimes (DTRs) is a key objective in precision medicine. Value search approaches, including (Bayesian) dynamic marginal structural models offer an attractive approach to estimation by mapping candidate regimes to their expected outcome. As parametric models for the expected outcomes may be mis-specified and lead to incorrect conclusions, a grid search over candidate DTRs has been proposed, but this may be computationally prohibitive and also subject to high uncertainty in the estimated value function. These inferential challenges can be addressed using Gaussian process ($mathcal {GP}$) optimization methods with estimators for the causal effect of adherence to a specified DTR. We demonstrate how to identify optimal DTRs using this approach in a variety of settings, including when the value function is multi-modal and show that the $mathcal {GP}$ modeling approach that recognizes noise in the estimated response surface leads to improved results as compared to a grid search approach. Further, we show that a grid search may not yield a robust solution and that it often utilizes information less efficiently than a $mathcal {GP}$ approach. The proposed approach is used to understand tailoring of HIV therapy to optimize CD4 cell counts.
{"title":"Estimating optimal dynamic treatment regimes with Gaussian process emulation.","authors":"Daniel Rodriguez Duque, David A Stephens, Erica E M Moodie","doi":"10.1093/biomtc/ujaf174","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf174","url":null,"abstract":"<p><p>Identifying dynamic treatment regimes (DTRs) is a key objective in precision medicine. Value search approaches, including (Bayesian) dynamic marginal structural models offer an attractive approach to estimation by mapping candidate regimes to their expected outcome. As parametric models for the expected outcomes may be mis-specified and lead to incorrect conclusions, a grid search over candidate DTRs has been proposed, but this may be computationally prohibitive and also subject to high uncertainty in the estimated value function. These inferential challenges can be addressed using Gaussian process ($mathcal {GP}$) optimization methods with estimators for the causal effect of adherence to a specified DTR. We demonstrate how to identify optimal DTRs using this approach in a variety of settings, including when the value function is multi-modal and show that the $mathcal {GP}$ modeling approach that recognizes noise in the estimated response surface leads to improved results as compared to a grid search approach. Further, we show that a grid search may not yield a robust solution and that it often utilizes information less efficiently than a $mathcal {GP}$ approach. The proposed approach is used to understand tailoring of HIV therapy to optimize CD4 cell counts.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boundary estimates of random effects covariance matrices commonly arise when using maximum likelihood (ML) estimation in generalized linear mixed effects models, leading to numerical challenges and affecting statistical inference. To mitigate this, we introduce penalties to the likelihood function derived from conditionally conjugate priors for the covariance or precision matrices of the random effects. Our choice of penalties (priors) allows representation through pseudo-observations, enabling implementation of the proposed penalized estimator within the existing ML software by augmenting the original data. We derive a procedure for constructing these pseudo-observations, a non-trivial task because their likelihood contribution must match the functional form of the penalty and depend only on the covariance or precision matrix of the random effects. Our method includes penalty parameters that can be set using existing prior knowledge or, when no reliable prior information is available, via a novel fully data-driven procedure that eliminates the need for prior specification. Through simulation studies under realistic scenarios, we illustrate that the proposed approach can provide improved estimates of random-effects covariance matrices compared with competing methods in the settings considered. The approach is further illustrated on real-world data.
{"title":"Non-boundary covariance matrix estimation in generalized linear mixed effects models using data augmentation priors.","authors":"Tina Košuta, Erik Langerholc, Rok Blagus","doi":"10.1093/biomtc/ujag013","DOIUrl":"https://doi.org/10.1093/biomtc/ujag013","url":null,"abstract":"<p><p>Boundary estimates of random effects covariance matrices commonly arise when using maximum likelihood (ML) estimation in generalized linear mixed effects models, leading to numerical challenges and affecting statistical inference. To mitigate this, we introduce penalties to the likelihood function derived from conditionally conjugate priors for the covariance or precision matrices of the random effects. Our choice of penalties (priors) allows representation through pseudo-observations, enabling implementation of the proposed penalized estimator within the existing ML software by augmenting the original data. We derive a procedure for constructing these pseudo-observations, a non-trivial task because their likelihood contribution must match the functional form of the penalty and depend only on the covariance or precision matrix of the random effects. Our method includes penalty parameters that can be set using existing prior knowledge or, when no reliable prior information is available, via a novel fully data-driven procedure that eliminates the need for prior specification. Through simulation studies under realistic scenarios, we illustrate that the proposed approach can provide improved estimates of random-effects covariance matrices compared with competing methods in the settings considered. The approach is further illustrated on real-world data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"82 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146140881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}