Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.
{"title":"Multiple tests for restricted mean time lost with competing risks data.","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"10.1093/biomtc/ujaf086","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf086","url":null,"abstract":"<p><p>Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.
{"title":"Two-stage estimators for spatial confounding with point-referenced data.","authors":"Nate Wiecha, Jane A Hoppin, Brian J Reich","doi":"10.1093/biomtc/ujaf093","DOIUrl":"10.1093/biomtc/ujaf093","url":null,"abstract":"<p><p>Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.
{"title":"Semiparametric joint modeling to estimate the treatment effect on a longitudinal surrogate with application to chronic kidney disease trials.","authors":"Xuan Wang, Jie Zhou, Layla Parast, Tom Greene","doi":"10.1093/biomtc/ujaf104","DOIUrl":"10.1093/biomtc/ujaf104","url":null,"abstract":"<p><p>In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sidi Wang, Satrajit Roychoudhury, Kelley M Kidwell
For progressive rare diseases like Duchenne muscular dystrophy (DMD), evaluating disease burden by measuring the totality of evidence from outcome data over time per patient can be highly informative, especially regarding how a new treatment impacts disease progression and functional outcomes. This paper focuses on new statistical approaches for analyzing data generated over time in a small sample, sequential, multiple assignment, randomized trial (snSMART), with an application to DMD. In addition, the use of external control data can enhance the statistical and operational efficiency in rare disease drug development by solving participant scarcity issues and ethical challenges. We employ a two-step robust meta-analytic approach to leverage external control data while adjusting for important baseline confounders and potential conflicts between external controls and trial data. Furthermore, our approach integrates important baseline covariates to account for patient heterogeneity and introduces a novel piecewise model to manage stage-wise treatment assignments. By applying this methodology to a case study in DMD research, we not only demonstrate the practical application and benefits of our approach but also highlight its potential to mitigate challenges in rare disease trials. Our findings advocate for a more nuanced and statistically robust analysis of treatment effects, thereby improving the reliability of clinical trial results.
{"title":"Evaluating longitudinal treatment effects for Duchenne muscular dystrophy using dynamically enriched Bayesian small sample, sequential, multiple assignment randomized trial (snSMART).","authors":"Sidi Wang, Satrajit Roychoudhury, Kelley M Kidwell","doi":"10.1093/biomtc/ujaf103","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf103","url":null,"abstract":"<p><p>For progressive rare diseases like Duchenne muscular dystrophy (DMD), evaluating disease burden by measuring the totality of evidence from outcome data over time per patient can be highly informative, especially regarding how a new treatment impacts disease progression and functional outcomes. This paper focuses on new statistical approaches for analyzing data generated over time in a small sample, sequential, multiple assignment, randomized trial (snSMART), with an application to DMD. In addition, the use of external control data can enhance the statistical and operational efficiency in rare disease drug development by solving participant scarcity issues and ethical challenges. We employ a two-step robust meta-analytic approach to leverage external control data while adjusting for important baseline confounders and potential conflicts between external controls and trial data. Furthermore, our approach integrates important baseline covariates to account for patient heterogeneity and introduces a novel piecewise model to manage stage-wise treatment assignments. By applying this methodology to a case study in DMD research, we not only demonstrate the practical application and benefits of our approach but also highlight its potential to mitigate challenges in rare disease trials. Our findings advocate for a more nuanced and statistically robust analysis of treatment effects, thereby improving the reliability of clinical trial results.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edouard Chatignoux, Zoé Uhry, Laurent Remontet, Isabelle Albert
The joint spatial distribution of two count outcomes (eg, counts of two diseases) is usually studied using a Poisson shared component model (P-SCM), which uses geographically structured latent variables to model spatial variations that are specific and shared by both outcomes. In this model, the correlation between the outcomes is assumed to be fully accounted for by the latent variables. However, in this article, we show that when the outcomes have an unknown number of cases in common, the bivariate counts exhibit a positive "residual" correlation, which the P-SCM wrongly attributes to the covariance of the latent variables, leading to biased inference and degraded predictive performance. Accordingly, we propose a new SCM based on the Bivariate-Poisson distribution (BP-SCM hereafter) to study such correlated bivariate data. The BP-SCM decomposes each count into counts of common and distinct cases, and then models each of these three counts (two distinct and one common) using Gaussian Markov Random Fields. The model is formulated in a Bayesian framework using Hamiltonian Monte Carlo inference. Simulations and a real-world application showed the good inferential and predictive performances of the BP-SCM and confirm the bias in P-SCM. BP-SCM provides rich epidemiological information, such as the mean levels of the unknown counts of common and distinct cases, and their shared and specific spatial variations.
{"title":"Joint disease mapping for bivariate count data with residual correlation due to unknown number of common cases.","authors":"Edouard Chatignoux, Zoé Uhry, Laurent Remontet, Isabelle Albert","doi":"10.1093/biomtc/ujaf119","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf119","url":null,"abstract":"<p><p>The joint spatial distribution of two count outcomes (eg, counts of two diseases) is usually studied using a Poisson shared component model (P-SCM), which uses geographically structured latent variables to model spatial variations that are specific and shared by both outcomes. In this model, the correlation between the outcomes is assumed to be fully accounted for by the latent variables. However, in this article, we show that when the outcomes have an unknown number of cases in common, the bivariate counts exhibit a positive \"residual\" correlation, which the P-SCM wrongly attributes to the covariance of the latent variables, leading to biased inference and degraded predictive performance. Accordingly, we propose a new SCM based on the Bivariate-Poisson distribution (BP-SCM hereafter) to study such correlated bivariate data. The BP-SCM decomposes each count into counts of common and distinct cases, and then models each of these three counts (two distinct and one common) using Gaussian Markov Random Fields. The model is formulated in a Bayesian framework using Hamiltonian Monte Carlo inference. Simulations and a real-world application showed the good inferential and predictive performances of the BP-SCM and confirm the bias in P-SCM. BP-SCM provides rich epidemiological information, such as the mean levels of the unknown counts of common and distinct cases, and their shared and specific spatial variations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in cases of concern (the first type of case) to other cases (the second type of case). One of the quantities of interest is the attributable effect for the first type of case-that is, the number of the first type of case that would not have occurred had the treatment been withheld from all units. In some case$^2$ studies, a key quantity of interest is the attributable effect for the first type of case. Two key assumptions that are usually made for making inferences about this attributable effect in case$^2$ studies are (1) treatment does not cause the second type of case, and (2) the treatment does not alter an individual's case type. However, these assumptions are not realistic in many real-data applications. In this article, we present a sensitivity analysis framework to scrutinize the impact of deviations from these assumptions on inferences for the attributable effect. We also include sensitivity analyses related to the assumption of unmeasured confounding, recognizing the potential bias introduced by unobserved covariates. The proposed methodology is exemplified through an investigation into whether having violent behavior in the last year of life increases suicide risk using the 1993 National Mortality Followback Survey dataset.
{"title":"Sensitivity analysis for attributable effects in case2 studies.","authors":"Kan Chen, Ting Ye, Dylan S Small","doi":"10.1093/biomtc/ujaf102","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf102","url":null,"abstract":"<p><p>The case$^2$ study, also referred to as the case-case study design, is a valuable approach for conducting inference for treatment effects. Unlike traditional case-control studies, the case$^2$ design compares treatment in cases of concern (the first type of case) to other cases (the second type of case). One of the quantities of interest is the attributable effect for the first type of case-that is, the number of the first type of case that would not have occurred had the treatment been withheld from all units. In some case$^2$ studies, a key quantity of interest is the attributable effect for the first type of case. Two key assumptions that are usually made for making inferences about this attributable effect in case$^2$ studies are (1) treatment does not cause the second type of case, and (2) the treatment does not alter an individual's case type. However, these assumptions are not realistic in many real-data applications. In this article, we present a sensitivity analysis framework to scrutinize the impact of deviations from these assumptions on inferences for the attributable effect. We also include sensitivity analyses related to the assumption of unmeasured confounding, recognizing the potential bias introduced by unobserved covariates. The proposed methodology is exemplified through an investigation into whether having violent behavior in the last year of life increases suicide risk using the 1993 National Mortality Followback Survey dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bayesian Cox semiparametric regression is an important problem in many clinical settings. The elliptical information geometry of Cox models is underutilized in Bayesian inference but can effectively bridge survival analysis and hierarchical Gaussian models. Survival models should be able to incorporate multilevel modeling such as case weights, frailties, and smoothing splines, in a straightforward manner similar to Gaussian models. To tackle these challenges, we propose the Cox-Pólya-Gamma algorithm for Bayesian multilevel Cox semiparametric regression and survival functions. Our novel computational procedure succinctly addresses the difficult problem of monotonicity-constrained modeling of the nonparametric baseline cumulative hazard along with multilevel regression. We develop two key strategies based on the elliptical geometry of Cox models that allows computation to be implemented in a few lines of code. First, we exploit an approximation between Cox models and negative binomial processes through the Poisson process to reduce Bayesian computation to iterative Gaussian sampling. Next, we appeal to sufficient dimension reduction to address the difficult computation of nonparametric baseline cumulative hazards, allowing for the collapse of the Markov transition within the Gibbs sampler based on beta sufficient statistics. We explore conditions for uniform ergodicity of the Cox-Pólya-Gamma algorithm. We provide software and demonstrate our multilevel modeling approach using open-source data and simulations.
{"title":"The Cox-Pólya-Gamma algorithm for flexible Bayesian inference of multilevel survival models.","authors":"Benny Ren, Jeffrey S Morris, Ian Barnett","doi":"10.1093/biomtc/ujaf121","DOIUrl":"10.1093/biomtc/ujaf121","url":null,"abstract":"<p><p>Bayesian Cox semiparametric regression is an important problem in many clinical settings. The elliptical information geometry of Cox models is underutilized in Bayesian inference but can effectively bridge survival analysis and hierarchical Gaussian models. Survival models should be able to incorporate multilevel modeling such as case weights, frailties, and smoothing splines, in a straightforward manner similar to Gaussian models. To tackle these challenges, we propose the Cox-Pólya-Gamma algorithm for Bayesian multilevel Cox semiparametric regression and survival functions. Our novel computational procedure succinctly addresses the difficult problem of monotonicity-constrained modeling of the nonparametric baseline cumulative hazard along with multilevel regression. We develop two key strategies based on the elliptical geometry of Cox models that allows computation to be implemented in a few lines of code. First, we exploit an approximation between Cox models and negative binomial processes through the Poisson process to reduce Bayesian computation to iterative Gaussian sampling. Next, we appeal to sufficient dimension reduction to address the difficult computation of nonparametric baseline cumulative hazards, allowing for the collapse of the Markov transition within the Gibbs sampler based on beta sufficient statistics. We explore conditions for uniform ergodicity of the Cox-Pólya-Gamma algorithm. We provide software and demonstrate our multilevel modeling approach using open-source data and simulations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.
{"title":"A Bayesian semiparametric mixture model for clustering zero-inflated microbiome data.","authors":"Suppapat Korsurat, Matthew D Koslovsky","doi":"10.1093/biomtc/ujaf125","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf125","url":null,"abstract":"<p><p>Microbiome research has immense potential for unlocking insights into human health and disease. A common goal in human microbiome research is identifying subgroups of individuals with similar microbial composition that may be linked to specific health states or environmental exposures. However, existing clustering methods are often not equipped to accommodate the complex structure of microbiome data and typically make limiting assumptions regarding the number of clusters in the data which can bias inference. Designed for zero-inflated multivariate compositional count data collected in microbiome research, we propose a novel Bayesian semiparametric mixture modeling framework that simultaneously learns the number of clusters in the data while performing cluster allocation. In simulation, we demonstrate the clustering performance of our method compared to distance- and model-based alternatives and the importance of accommodating zero-inflation when present in the data. We then apply the model to identify clusters in microbiome data collected in a study designed to investigate the relation between gut microbial composition and enteric diarrheal disease.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145124127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matias Janvin, Pål C Ryalen, Aaron L Sarvet, Mats J Stensrud
In studies of medical treatments, individuals often experience post-treatment events that predict their future outcomes. In this work, we study how to use initial observations of a recurrent event-a type of post-treatment event-to offer updated treatment recommendations in settings where no, or few, individuals are observed to switch between treatment arms. Specifically, we formulate an estimand quantifying the average effect of switching treatment on subsequent events. We derive bounds on the value of this estimand under plausible conditions and propose non-parametric estimators of the bounds. Furthermore, we define a value and regret function for a dynamic treatment-switching regime, and use these to determine 3 types of optimal regimes under partial identification: the pessimist (maximin value), optimist (maximax value), and opportunist (minimax regret) regimes. The pessimist regime is guaranteed to perform at least as well as the standard of care. We apply our methods to data from the Systolic Blood Pressure Intervention Trial.
{"title":"A positivity robust strategy to study effects of switching treatment.","authors":"Matias Janvin, Pål C Ryalen, Aaron L Sarvet, Mats J Stensrud","doi":"10.1093/biomtc/ujaf085","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf085","url":null,"abstract":"<p><p>In studies of medical treatments, individuals often experience post-treatment events that predict their future outcomes. In this work, we study how to use initial observations of a recurrent event-a type of post-treatment event-to offer updated treatment recommendations in settings where no, or few, individuals are observed to switch between treatment arms. Specifically, we formulate an estimand quantifying the average effect of switching treatment on subsequent events. We derive bounds on the value of this estimand under plausible conditions and propose non-parametric estimators of the bounds. Furthermore, we define a value and regret function for a dynamic treatment-switching regime, and use these to determine 3 types of optimal regimes under partial identification: the pessimist (maximin value), optimist (maximax value), and opportunist (minimax regret) regimes. The pessimist regime is guaranteed to perform at least as well as the standard of care. We apply our methods to data from the Systolic Blood Pressure Intervention Trial.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerous statistical models have been proposed for conducting meta-analysis of diagnostic accuracy studies when a gold standard is available. However, in real-world scenarios, the gold standard test may not be perfect due to several factors such as measurement error, non-availability, invasiveness, or high cost. A generalized linear mixed model (GLMM) is currently recommended to account for an imperfect reference test. We propose vine copula mixed models for meta-analysis of diagnostic test accuracy studies with an imperfect reference standard. Our general models include the GLMM as a special case, can have arbitrary univariate distributions for the random effects, and can provide tail dependencies and asymmetries. Our general methodology is demonstrated with an extensive simulation study and illustrated by insightfully re-analyzing the data of a meta-analysis of the Papanicolaou test that diagnoses cervical neoplasia. Our study suggests that there can be an improvement on GLMM and makes the argument for moving to vine copula random effects models.
{"title":"Vine copula mixed models for meta-analysis of diagnostic accuracy studies without a gold standard.","authors":"Aristidis K Nikoloulopoulos","doi":"10.1093/biomtc/ujaf037","DOIUrl":"10.1093/biomtc/ujaf037","url":null,"abstract":"<p><p>Numerous statistical models have been proposed for conducting meta-analysis of diagnostic accuracy studies when a gold standard is available. However, in real-world scenarios, the gold standard test may not be perfect due to several factors such as measurement error, non-availability, invasiveness, or high cost. A generalized linear mixed model (GLMM) is currently recommended to account for an imperfect reference test. We propose vine copula mixed models for meta-analysis of diagnostic test accuracy studies with an imperfect reference standard. Our general models include the GLMM as a special case, can have arbitrary univariate distributions for the random effects, and can provide tail dependencies and asymmetries. Our general methodology is demonstrated with an extensive simulation study and illustrated by insightfully re-analyzing the data of a meta-analysis of the Papanicolaou test that diagnoses cervical neoplasia. Our study suggests that there can be an improvement on GLMM and makes the argument for moving to vine copula random effects models.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}