This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.
{"title":"Revisiting optimal allocations for binary responses: insights from considering type-I error rate control.","authors":"Lukas Pin, Sofía S Villar, William F Rosenberger","doi":"10.1093/biomtc/ujaf114","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf114","url":null,"abstract":"<p><p>This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.
{"title":"Binary regression and classification with covariates in metric spaces.","authors":"Yinan Lin, Zhenhua Lin","doi":"10.1093/biomtc/ujaf123","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf123","url":null,"abstract":"<p><p>Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.
{"title":"Multiple tests for restricted mean time lost with competing risks data.","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"10.1093/biomtc/ujaf086","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf086","url":null,"abstract":"<p><p>Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.
{"title":"Two-stage estimators for spatial confounding with point-referenced data.","authors":"Nate Wiecha, Jane A Hoppin, Brian J Reich","doi":"10.1093/biomtc/ujaf093","DOIUrl":"10.1093/biomtc/ujaf093","url":null,"abstract":"<p><p>Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.
{"title":"Semiparametric joint modeling to estimate the treatment effect on a longitudinal surrogate with application to chronic kidney disease trials.","authors":"Xuan Wang, Jie Zhou, Layla Parast, Tom Greene","doi":"10.1093/biomtc/ujaf104","DOIUrl":"10.1093/biomtc/ujaf104","url":null,"abstract":"<p><p>In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sidi Wang, Satrajit Roychoudhury, Kelley M Kidwell
For progressive rare diseases like Duchenne muscular dystrophy (DMD), evaluating disease burden by measuring the totality of evidence from outcome data over time per patient can be highly informative, especially regarding how a new treatment impacts disease progression and functional outcomes. This paper focuses on new statistical approaches for analyzing data generated over time in a small sample, sequential, multiple assignment, randomized trial (snSMART), with an application to DMD. In addition, the use of external control data can enhance the statistical and operational efficiency in rare disease drug development by solving participant scarcity issues and ethical challenges. We employ a two-step robust meta-analytic approach to leverage external control data while adjusting for important baseline confounders and potential conflicts between external controls and trial data. Furthermore, our approach integrates important baseline covariates to account for patient heterogeneity and introduces a novel piecewise model to manage stage-wise treatment assignments. By applying this methodology to a case study in DMD research, we not only demonstrate the practical application and benefits of our approach but also highlight its potential to mitigate challenges in rare disease trials. Our findings advocate for a more nuanced and statistically robust analysis of treatment effects, thereby improving the reliability of clinical trial results.
{"title":"Evaluating longitudinal treatment effects for Duchenne muscular dystrophy using dynamically enriched Bayesian small sample, sequential, multiple assignment randomized trial (snSMART).","authors":"Sidi Wang, Satrajit Roychoudhury, Kelley M Kidwell","doi":"10.1093/biomtc/ujaf103","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf103","url":null,"abstract":"<p><p>For progressive rare diseases like Duchenne muscular dystrophy (DMD), evaluating disease burden by measuring the totality of evidence from outcome data over time per patient can be highly informative, especially regarding how a new treatment impacts disease progression and functional outcomes. This paper focuses on new statistical approaches for analyzing data generated over time in a small sample, sequential, multiple assignment, randomized trial (snSMART), with an application to DMD. In addition, the use of external control data can enhance the statistical and operational efficiency in rare disease drug development by solving participant scarcity issues and ethical challenges. We employ a two-step robust meta-analytic approach to leverage external control data while adjusting for important baseline confounders and potential conflicts between external controls and trial data. Furthermore, our approach integrates important baseline covariates to account for patient heterogeneity and introduces a novel piecewise model to manage stage-wise treatment assignments. By applying this methodology to a case study in DMD research, we not only demonstrate the practical application and benefits of our approach but also highlight its potential to mitigate challenges in rare disease trials. Our findings advocate for a more nuanced and statistically robust analysis of treatment effects, thereby improving the reliability of clinical trial results.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edouard Chatignoux, Zoé Uhry, Laurent Remontet, Isabelle Albert
The joint spatial distribution of two count outcomes (eg, counts of two diseases) is usually studied using a Poisson shared component model (P-SCM), which uses geographically structured latent variables to model spatial variations that are specific and shared by both outcomes. In this model, the correlation between the outcomes is assumed to be fully accounted for by the latent variables. However, in this article, we show that when the outcomes have an unknown number of cases in common, the bivariate counts exhibit a positive "residual" correlation, which the P-SCM wrongly attributes to the covariance of the latent variables, leading to biased inference and degraded predictive performance. Accordingly, we propose a new SCM based on the Bivariate-Poisson distribution (BP-SCM hereafter) to study such correlated bivariate data. The BP-SCM decomposes each count into counts of common and distinct cases, and then models each of these three counts (two distinct and one common) using Gaussian Markov Random Fields. The model is formulated in a Bayesian framework using Hamiltonian Monte Carlo inference. Simulations and a real-world application showed the good inferential and predictive performances of the BP-SCM and confirm the bias in P-SCM. BP-SCM provides rich epidemiological information, such as the mean levels of the unknown counts of common and distinct cases, and their shared and specific spatial variations.
{"title":"Joint disease mapping for bivariate count data with residual correlation due to unknown number of common cases.","authors":"Edouard Chatignoux, Zoé Uhry, Laurent Remontet, Isabelle Albert","doi":"10.1093/biomtc/ujaf119","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf119","url":null,"abstract":"<p><p>The joint spatial distribution of two count outcomes (eg, counts of two diseases) is usually studied using a Poisson shared component model (P-SCM), which uses geographically structured latent variables to model spatial variations that are specific and shared by both outcomes. In this model, the correlation between the outcomes is assumed to be fully accounted for by the latent variables. However, in this article, we show that when the outcomes have an unknown number of cases in common, the bivariate counts exhibit a positive \"residual\" correlation, which the P-SCM wrongly attributes to the covariance of the latent variables, leading to biased inference and degraded predictive performance. Accordingly, we propose a new SCM based on the Bivariate-Poisson distribution (BP-SCM hereafter) to study such correlated bivariate data. The BP-SCM decomposes each count into counts of common and distinct cases, and then models each of these three counts (two distinct and one common) using Gaussian Markov Random Fields. The model is formulated in a Bayesian framework using Hamiltonian Monte Carlo inference. Simulations and a real-world application showed the good inferential and predictive performances of the BP-SCM and confirm the bias in P-SCM. BP-SCM provides rich epidemiological information, such as the mean levels of the unknown counts of common and distinct cases, and their shared and specific spatial variations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in cases of concern (the first type of case) to other cases (the second type of case). One of the quantities of interest is the attributable effect for the first type of case-that is, the number of the first type of case that would not have occurred had the treatment been withheld from all units. In some case$^2$ studies, a key quantity of interest is the attributable effect for the first type of case. Two key assumptions that are usually made for making inferences about this attributable effect in case$^2$ studies are (1) treatment does not cause the second type of case, and (2) the treatment does not alter an individual's case type. However, these assumptions are not realistic in many real-data applications. In this article, we present a sensitivity analysis framework to scrutinize the impact of deviations from these assumptions on inferences for the attributable effect. We also include sensitivity analyses related to the assumption of unmeasured confounding, recognizing the potential bias introduced by unobserved covariates. The proposed methodology is exemplified through an investigation into whether having violent behavior in the last year of life increases suicide risk using the 1993 National Mortality Followback Survey dataset.
{"title":"Sensitivity analysis for attributable effects in case2 studies.","authors":"Kan Chen, Ting Ye, Dylan S Small","doi":"10.1093/biomtc/ujaf102","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf102","url":null,"abstract":"<p><p>The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in cases of concern (the first type of case) to other cases (the second type of case). One of the quantities of interest is the attributable effect for the first type of case-that is, the number of the first type of case that would not have occurred had the treatment been withheld from all units. In some case$^2$ studies, a key quantity of interest is the attributable effect for the first type of case. Two key assumptions that are usually made for making inferences about this attributable effect in case$^2$ studies are (1) treatment does not cause the second type of case, and (2) the treatment does not alter an individual's case type. However, these assumptions are not realistic in many real-data applications. In this article, we present a sensitivity analysis framework to scrutinize the impact of deviations from these assumptions on inferences for the attributable effect. We also include sensitivity analyses related to the assumption of unmeasured confounding, recognizing the potential bias introduced by unobserved covariates. The proposed methodology is exemplified through an investigation into whether having violent behavior in the last year of life increases suicide risk using the 1993 National Mortality Followback Survey dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bayesian Cox semiparametric regression is an important problem in many clinical settings. The elliptical information geometry of Cox models is underutilized in Bayesian inference but can effectively bridge survival analysis and hierarchical Gaussian models. Survival models should be able to incorporate multilevel modeling such as case weights, frailties, and smoothing splines, in a straightforward manner similar to Gaussian models. To tackle these challenges, we propose the Cox-Pólya-Gamma algorithm for Bayesian multilevel Cox semiparametric regression and survival functions. Our novel computational procedure succinctly addresses the difficult problem of monotonicity-constrained modeling of the nonparametric baseline cumulative hazard along with multilevel regression. We develop two key strategies based on the elliptical geometry of Cox models that allows computation to be implemented in a few lines of code. First, we exploit an approximation between Cox models and negative binomial processes through the Poisson process to reduce Bayesian computation to iterative Gaussian sampling. Next, we appeal to sufficient dimension reduction to address the difficult computation of nonparametric baseline cumulative hazards, allowing for the collapse of the Markov transition within the Gibbs sampler based on beta sufficient statistics. We explore conditions for uniform ergodicity of the Cox-Pólya-Gamma algorithm. We provide software and demonstrate our multilevel modeling approach using open-source data and simulations.
{"title":"The Cox-Pólya-Gamma algorithm for flexible Bayesian inference of multilevel survival models.","authors":"Benny Ren, Jeffrey S Morris, Ian Barnett","doi":"10.1093/biomtc/ujaf121","DOIUrl":"10.1093/biomtc/ujaf121","url":null,"abstract":"<p><p>Bayesian Cox semiparametric regression is an important problem in many clinical settings. The elliptical information geometry of Cox models is underutilized in Bayesian inference but can effectively bridge survival analysis and hierarchical Gaussian models. Survival models should be able to incorporate multilevel modeling such as case weights, frailties, and smoothing splines, in a straightforward manner similar to Gaussian models. To tackle these challenges, we propose the Cox-Pólya-Gamma algorithm for Bayesian multilevel Cox semiparametric regression and survival functions. Our novel computational procedure succinctly addresses the difficult problem of monotonicity-constrained modeling of the nonparametric baseline cumulative hazard along with multilevel regression. We develop two key strategies based on the elliptical geometry of Cox models that allows computation to be implemented in a few lines of code. First, we exploit an approximation between Cox models and negative binomial processes through the Poisson process to reduce Bayesian computation to iterative Gaussian sampling. Next, we appeal to sufficient dimension reduction to address the difficult computation of nonparametric baseline cumulative hazards, allowing for the collapse of the Markov transition within the Gibbs sampler based on beta sufficient statistics. We explore conditions for uniform ergodicity of the Cox-Pólya-Gamma algorithm. We provide software and demonstrate our multilevel modeling approach using open-source data and simulations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.
{"title":"A Bayesian semiparametric mixture model for clustering zero-inflated microbiome data.","authors":"Suppapat Korsurat, Matthew D Koslovsky","doi":"10.1093/biomtc/ujaf125","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf125","url":null,"abstract":"<p><p>Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145124127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}