The spatial transcriptomics (ST) clustering plays a crucial role in elucidating the tissue spatial heterogeneity. An accurate ST clustering result can greatly benefit downstream biological analyses. As various ST clustering approaches are proposed in recent years, comparing their clustering accuracy becomes important in benchmarking studies. However, the widely used metric, adjusted Rand index (ARI), totally ignores the spatial information in ST data, which prevents ARI from fully evaluating spatial ST clustering methods. We propose a spatially aware Rand index (spRI) as well as spatially aware adjusted Rand index (spARI) that incorporate the spatial distance information. Specifically, when comparing two partitions, spRI provides a disagreement object pair with a weight relying on the distance of the two objects, whereas Rand index assigns a zero weight to it. This spatially aware feature of spRI adaptively differentiates disagreement object pairs based on their distinct distances, providing a useful evaluation metric that favors spatial coherence of clustering. The spARI is obtained by adjusting spRI for random chances such that its expectation takes zero under an appropriate null model. Statistical properties of spRI and spARI are discussed. The applications to simulation study and two ST datasets demonstrate the improved utilities of spARI compared to ARI in evaluating ST clustering methods.
{"title":"Spatially aware adjusted Rand index for evaluating spatial transcriptomics clustering.","authors":"Yinqiao Yan, Xiangnan Feng, Xiangyu Luo","doi":"10.1093/biomtc/ujaf127","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf127","url":null,"abstract":"<p><p>The spatial transcriptomics (ST) clustering plays a crucial role in elucidating the tissue spatial heterogeneity. An accurate ST clustering result can greatly benefit downstream biological analyses. As various ST clustering approaches are proposed in recent years, comparing their clustering accuracy becomes important in benchmarking studies. However, the widely used metric, adjusted Rand index (ARI), totally ignores the spatial information in ST data, which prevents ARI from fully evaluating spatial ST clustering methods. We propose a spatially aware Rand index (spRI) as well as spatially aware adjusted Rand index (spARI) that incorporate the spatial distance information. Specifically, when comparing two partitions, spRI provides a disagreement object pair with a weight relying on the distance of the two objects, whereas Rand index assigns a zero weight to it. This spatially aware feature of spRI adaptively differentiates disagreement object pairs based on their distinct distances, providing a useful evaluation metric that favors spatial coherence of clustering. The spARI is obtained by adjusting spRI for random chances such that its expectation takes zero under an appropriate null model. Statistical properties of spRI and spARI are discussed. The applications to simulation study and two ST datasets demonstrate the improved utilities of spARI compared to ARI in evaluating ST clustering methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A covariance-on-covariance regression model is introduced in this manuscript. It is assumed that there exists (at least) a pair of linear projections on outcome covariance matrices and predictor covariance matrices such that a log-linear model links the variances in the projection spaces, as well as additional covariates of interest. An ordinary least square type of estimator is proposed to simultaneously identify the projections and estimate model coefficients. Under regularity conditions, the proposed estimator is asymptotically consistent. The superior performance of the proposed approach over existing methods is demonstrated via simulation studies. Applying to data collected in the Human Connectome Project Aging study, the proposed approach identifies 3 pairs of brain networks, where functional connectivity within the resting-state network predicts functional connectivity within the corresponding task-state network. The 3 networks correspond to a global signal network, a task-related network, and a task-unrelated network. The findings are consistent with existing knowledge about brain function.
{"title":"Covariance-on-covariance regression.","authors":"Yi Zhao, Yize Zhao","doi":"10.1093/biomtc/ujaf097","DOIUrl":"10.1093/biomtc/ujaf097","url":null,"abstract":"<p><p>A covariance-on-covariance regression model is introduced in this manuscript. It is assumed that there exists (at least) a pair of linear projections on outcome covariance matrices and predictor covariance matrices such that a log-linear model links the variances in the projection spaces, as well as additional covariates of interest. An ordinary least square type of estimator is proposed to simultaneously identify the projections and estimate model coefficients. Under regularity conditions, the proposed estimator is asymptotically consistent. The superior performance of the proposed approach over existing methods is demonstrated via simulation studies. Applying to data collected in the Human Connectome Project Aging study, the proposed approach identifies 3 pairs of brain networks, where functional connectivity within the resting-state network predicts functional connectivity within the corresponding task-state network. The 3 networks correspond to a global signal network, a task-related network, and a task-unrelated network. The findings are consistent with existing knowledge about brain function.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312406/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in 2 senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a 2-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges.
{"title":"Sparse 2-stage Bayesian meta-analysis for individualized treatments.","authors":"Junwei Shen, Erica E M Moodie, Shirin Golchi","doi":"10.1093/biomtc/ujaf082","DOIUrl":"10.1093/biomtc/ujaf082","url":null,"abstract":"<p><p>Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in 2 senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a 2-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyi Pan, Hwashin Hyun Shin, Glen McGee, Alex Stringer
Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specified in a data-adaptive way. While the ACE-DLNM framework is highly interpretable, it is limited to continuous outcomes and does not scale well to large datasets. Motivated by a large analysis of daily pollution and respiratory hospitalization counts in Canada between 2001 and 2018, we propose a generalized ACE-DLNM incorporating penalized splines, improving upon existing ACE-DLNM methods to accommodate general response types. We then develop a computationally efficient estimation strategy based on profile likelihood and Laplace approximate marginal likelihood with Newton-type methods. We demonstrate the performance and practical advantages of the proposed method through simulations. In application to the motivating analysis, the proposed method yields more stable inferences compared to generalized additive models with fixed exposures, while retaining interpretability.
{"title":"Estimating associations between cumulative exposure and health via generalized distributed lag non-linear models using penalized splines.","authors":"Tianyi Pan, Hwashin Hyun Shin, Glen McGee, Alex Stringer","doi":"10.1093/biomtc/ujaf116","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf116","url":null,"abstract":"<p><p>Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specified in a data-adaptive way. While the ACE-DLNM framework is highly interpretable, it is limited to continuous outcomes and does not scale well to large datasets. Motivated by a large analysis of daily pollution and respiratory hospitalization counts in Canada between 2001 and 2018, we propose a generalized ACE-DLNM incorporating penalized splines, improving upon existing ACE-DLNM methods to accommodate general response types. We then develop a computationally efficient estimation strategy based on profile likelihood and Laplace approximate marginal likelihood with Newton-type methods. We demonstrate the performance and practical advantages of the proposed method through simulations. In application to the motivating analysis, the proposed method yields more stable inferences compared to generalized additive models with fixed exposures, while retaining interpretability.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PuXue Qiao, Chun Fung Kwok, Guoqi Qian, Davis J McCarthy
Copy number alterations (CNA) are important drivers and markers of clonal structures within tumors. Understanding these structures at single-cell resolution is crucial to advancing cancer treatments. The objective is to cluster single cells into clones and identify CNA events in each clone. Early attempts often sacrifice the intrinsic link between cell clustering and clonal CNA detection for simplicity and rely heavily on human input for critical parameters such as the number of clones. Here, we develop a Bayesian model to utilize single-cell RNA sequencing (scRNA-seq) data for automatic analysis of intra-tumoral clonal structure concerning CNAs, without reliance on prior knowledge. The model clusters cells into sub-tumoral clones, identifies the number of clones, and simultaneously infers the clonal CNA profiles. It synergistically incorporates input from gene expression and germline single-nucleotide polymorphisms. A Gibbs sampling algorithm has been implemented and is available as an R package Chloris. We demonstrate that our new method compares strongly against existing software tools in terms of both cell clustering and CNA profile identification accuracy. Application to human metastatic melanoma and anaplastic thyroid tumor data demonstrates accurate clustering of tumor and non-tumor cells and reveals clonal CNA profiles that highlight functional gene expression differences between clones from the same tumor.
拷贝数改变(Copy number change, CNA)是肿瘤克隆结构的重要驱动因素和标志。在单细胞分辨率上理解这些结构对于推进癌症治疗至关重要。目的是将单个细胞聚类成克隆,并确定每个克隆中的CNA事件。早期的尝试往往为了简单而牺牲细胞聚类和克隆CNA检测之间的内在联系,并且严重依赖于人类输入关键参数,如克隆数量。在这里,我们开发了一个贝叶斯模型,利用单细胞RNA测序(scRNA-seq)数据自动分析肿瘤内克隆结构,而不依赖于先验知识。该模型将细胞聚集成亚肿瘤克隆,识别克隆数量,同时推断克隆CNA谱。它协同整合了基因表达和种系单核苷酸多态性的输入。吉布斯采样算法已经实现,并可作为一个R包氯气。我们证明了我们的新方法在细胞聚类和CNA轮廓识别精度方面与现有的软件工具有很强的对比。应用于人类转移性黑色素瘤和间变性甲状腺肿瘤数据证实了肿瘤和非肿瘤细胞的准确聚类,揭示了克隆CNA谱,突出了来自同一肿瘤的克隆之间功能基因表达的差异。
{"title":"Bayesian inference for copy number intra-tumoral heterogeneity from single-cell RNA-sequencing data.","authors":"PuXue Qiao, Chun Fung Kwok, Guoqi Qian, Davis J McCarthy","doi":"10.1093/biomtc/ujaf115","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf115","url":null,"abstract":"<p><p>Copy number alterations (CNA) are important drivers and markers of clonal structures within tumors. Understanding these structures at single-cell resolution is crucial to advancing cancer treatments. The objective is to cluster single cells into clones and identify CNA events in each clone. Early attempts often sacrifice the intrinsic link between cell clustering and clonal CNA detection for simplicity and rely heavily on human input for critical parameters such as the number of clones. Here, we develop a Bayesian model to utilize single-cell RNA sequencing (scRNA-seq) data for automatic analysis of intra-tumoral clonal structure concerning CNAs, without reliance on prior knowledge. The model clusters cells into sub-tumoral clones, identifies the number of clones, and simultaneously infers the clonal CNA profiles. It synergistically incorporates input from gene expression and germline single-nucleotide polymorphisms. A Gibbs sampling algorithm has been implemented and is available as an R package Chloris. We demonstrate that our new method compares strongly against existing software tools in terms of both cell clustering and CNA profile identification accuracy. Application to human metastatic melanoma and anaplastic thyroid tumor data demonstrates accurate clustering of tumor and non-tumor cells and reveals clonal CNA profiles that highlight functional gene expression differences between clones from the same tumor.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144940997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.
{"title":"Revisiting optimal allocations for binary responses: insights from considering type-I error rate control.","authors":"Lukas Pin, Sofía S Villar, William F Rosenberger","doi":"10.1093/biomtc/ujaf114","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf114","url":null,"abstract":"<p><p>This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.
{"title":"Binary regression and classification with covariates in metric spaces.","authors":"Yinan Lin, Zhenhua Lin","doi":"10.1093/biomtc/ujaf123","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf123","url":null,"abstract":"<p><p>Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.
{"title":"Multiple tests for restricted mean time lost with competing risks data.","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"10.1093/biomtc/ujaf086","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf086","url":null,"abstract":"<p><p>Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.
{"title":"Two-stage estimators for spatial confounding with point-referenced data.","authors":"Nate Wiecha, Jane A Hoppin, Brian J Reich","doi":"10.1093/biomtc/ujaf093","DOIUrl":"10.1093/biomtc/ujaf093","url":null,"abstract":"<p><p>Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.
{"title":"Semiparametric joint modeling to estimate the treatment effect on a longitudinal surrogate with application to chronic kidney disease trials.","authors":"Xuan Wang, Jie Zhou, Layla Parast, Tom Greene","doi":"10.1093/biomtc/ujaf104","DOIUrl":"10.1093/biomtc/ujaf104","url":null,"abstract":"<p><p>In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}