首页 > 最新文献

Biometrics最新文献

英文 中文
Spatially aware adjusted Rand index for evaluating spatial transcriptomics clustering. 空间感知调整Rand指数评估空间转录组聚类。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf127
Yinqiao Yan, Xiangnan Feng, Xiangyu Luo

The spatial transcriptomics (ST) clustering plays a crucial role in elucidating the tissue spatial heterogeneity. An accurate ST clustering result can greatly benefit downstream biological analyses. As various ST clustering approaches are proposed in recent years, comparing their clustering accuracy becomes important in benchmarking studies. However, the widely used metric, adjusted Rand index (ARI), totally ignores the spatial information in ST data, which prevents ARI from fully evaluating spatial ST clustering methods. We propose a spatially aware Rand index (spRI) as well as spatially aware adjusted Rand index (spARI) that incorporate the spatial distance information. Specifically, when comparing two partitions, spRI provides a disagreement object pair with a weight relying on the distance of the two objects, whereas Rand index assigns a zero weight to it. This spatially aware feature of spRI adaptively differentiates disagreement object pairs based on their distinct distances, providing a useful evaluation metric that favors spatial coherence of clustering. The spARI is obtained by adjusting spRI for random chances such that its expectation takes zero under an appropriate null model. Statistical properties of spRI and spARI are discussed. The applications to simulation study and two ST datasets demonstrate the improved utilities of spARI compared to ARI in evaluating ST clustering methods.

空间转录组学(ST)聚类在阐明组织空间异质性中起着至关重要的作用。准确的ST聚类结果可以极大地有利于下游生物分析。由于近年来提出了各种ST聚类方法,比较它们的聚类精度在基准测试研究中变得非常重要。然而,目前广泛使用的指标调整Rand指数(ARI)完全忽略了ST数据中的空间信息,这使得ARI无法充分评价空间ST聚类方法。本文提出了包含空间距离信息的空间感知兰德指数(spRI)和空间感知调整兰德指数(spARI)。具体来说,在比较两个分区时,spRI提供了一个不一致的对象对,其权重依赖于两个对象的距离,而Rand索引为其分配了一个零权重。spRI的这种空间感知特征基于不同的距离自适应区分不同的目标对,提供了一个有用的评价指标,有利于聚类的空间一致性。通过根据随机机会调整spRI,使其期望在适当的零模型下为零,从而获得spARI。讨论了spRI和spARI的统计性质。模拟研究和两个ST数据集的应用表明,与ARI相比,spARI在评估ST聚类方法方面的效用有所提高。
{"title":"Spatially aware adjusted Rand index for evaluating spatial transcriptomics clustering.","authors":"Yinqiao Yan, Xiangnan Feng, Xiangyu Luo","doi":"10.1093/biomtc/ujaf127","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf127","url":null,"abstract":"<p><p>The spatial transcriptomics (ST) clustering plays a crucial role in elucidating the tissue spatial heterogeneity. An accurate ST clustering result can greatly benefit downstream biological analyses. As various ST clustering approaches are proposed in recent years, comparing their clustering accuracy becomes important in benchmarking studies. However, the widely used metric, adjusted Rand index (ARI), totally ignores the spatial information in ST data, which prevents ARI from fully evaluating spatial ST clustering methods. We propose a spatially aware Rand index (spRI) as well as spatially aware adjusted Rand index (spARI) that incorporate the spatial distance information. Specifically, when comparing two partitions, spRI provides a disagreement object pair with a weight relying on the distance of the two objects, whereas Rand index assigns a zero weight to it. This spatially aware feature of spRI adaptively differentiates disagreement object pairs based on their distinct distances, providing a useful evaluation metric that favors spatial coherence of clustering. The spARI is obtained by adjusting spRI for random chances such that its expectation takes zero under an appropriate null model. Statistical properties of spRI and spARI are discussed. The applications to simulation study and two ST datasets demonstrate the improved utilities of spARI compared to ARI in evaluating ST clustering methods.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145147550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Covariance-on-covariance regression. Covariance-on-covariance回归。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf097
Yi Zhao, Yize Zhao

A covariance-on-covariance regression model is introduced in this manuscript. It is assumed that there exists (at least) a pair of linear projections on outcome covariance matrices and predictor covariance matrices such that a log-linear model links the variances in the projection spaces, as well as additional covariates of interest. An ordinary least square type of estimator is proposed to simultaneously identify the projections and estimate model coefficients. Under regularity conditions, the proposed estimator is asymptotically consistent. The superior performance of the proposed approach over existing methods is demonstrated via simulation studies. Applying to data collected in the Human Connectome Project Aging study, the proposed approach identifies 3 pairs of brain networks, where functional connectivity within the resting-state network predicts functional connectivity within the corresponding task-state network. The 3 networks correspond to a global signal network, a task-related network, and a task-unrelated network. The findings are consistent with existing knowledge about brain function.

本文介绍了协方差-对协方差回归模型。假设在结果协方差矩阵和预测协方差矩阵上存在(至少)一对线性投影,使得对数线性模型将投影空间中的方差以及其他感兴趣的协变量联系起来。提出了一种普通最小二乘估计量,用于同时识别投影和估计模型系数。在正则性条件下,所提出的估计量是渐近一致的。通过仿真研究证明了该方法优于现有方法的性能。应用于人类连接组项目衰老研究中收集的数据,提出的方法确定了3对大脑网络,其中静息状态网络中的功能连接预测了相应任务状态网络中的功能连接。这3个网络分别对应一个全局信号网络、一个任务相关网络和一个任务无关网络。这些发现与现有的大脑功能知识是一致的。
{"title":"Covariance-on-covariance regression.","authors":"Yi Zhao, Yize Zhao","doi":"10.1093/biomtc/ujaf097","DOIUrl":"10.1093/biomtc/ujaf097","url":null,"abstract":"<p><p>A covariance-on-covariance regression model is introduced in this manuscript. It is assumed that there exists (at least) a pair of linear projections on outcome covariance matrices and predictor covariance matrices such that a log-linear model links the variances in the projection spaces, as well as additional covariates of interest. An ordinary least square type of estimator is proposed to simultaneously identify the projections and estimate model coefficients. Under regularity conditions, the proposed estimator is asymptotically consistent. The superior performance of the proposed approach over existing methods is demonstrated via simulation studies. Applying to data collected in the Human Connectome Project Aging study, the proposed approach identifies 3 pairs of brain networks, where functional connectivity within the resting-state network predicts functional connectivity within the corresponding task-state network. The 3 networks correspond to a global signal network, a task-related network, and a task-unrelated network. The findings are consistent with existing knowledge about brain function.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312406/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse 2-stage Bayesian meta-analysis for individualized treatments. 个性化治疗的稀疏2阶段贝叶斯荟萃分析。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf082
Junwei Shen, Erica E M Moodie, Shirin Golchi

Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in 2 senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a 2-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges.

个体化治疗规则根据临床、人口统计学和其他特征为患者量身定制治疗方案。估计个体化治疗规则需要确定从特定治疗中获益最多的个体,从而检测治疗效果的可变性。为了制定有效的个体化治疗规则,可能需要来自多地点研究的数据,因为较小的数据集用于检测通常较小的治疗-协变量相互作用的能力较低。然而,个人层面数据的共享有时受到限制。此外,稀疏性可能在两种意义上产生:不同的数据点可能从不同的种群中招募,使得在所有地点估计相同的模型或所有感兴趣的参数是不可行的,并且模型中用于处理规则的非零参数的数量可能很小。为了解决这些问题,我们采用两阶段贝叶斯荟萃分析方法来估计个性化治疗规则,该规则使用多站点数据优化患者预期结果,而不会泄露超出站点的个人水平数据。仿真结果表明,我们的方法可以提供一致的参数估计,充分表征了最优的个性化治疗规则。我们使用来自国际华法林药物遗传学协会的数据来估计最佳华法林剂量策略,其中数据稀疏和小治疗-协变量相互作用效应带来了额外的统计挑战。
{"title":"Sparse 2-stage Bayesian meta-analysis for individualized treatments.","authors":"Junwei Shen, Erica E M Moodie, Shirin Golchi","doi":"10.1093/biomtc/ujaf082","DOIUrl":"10.1093/biomtc/ujaf082","url":null,"abstract":"<p><p>Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in 2 senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a 2-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating associations between cumulative exposure and health via generalized distributed lag non-linear models using penalized splines. 利用惩罚样条的广义分布滞后非线性模型估计累积暴露与健康之间的关系。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf116
Tianyi Pan, Hwashin Hyun Shin, Glen McGee, Alex Stringer

Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specified in a data-adaptive way. While the ACE-DLNM framework is highly interpretable, it is limited to continuous outcomes and does not scale well to large datasets. Motivated by a large analysis of daily pollution and respiratory hospitalization counts in Canada between 2001 and 2018, we propose a generalized ACE-DLNM incorporating penalized splines, improving upon existing ACE-DLNM methods to accommodate general response types. We then develop a computationally efficient estimation strategy based on profile likelihood and Laplace approximate marginal likelihood with Newton-type methods. We demonstrate the performance and practical advantages of the proposed method through simulations. In application to the motivating analysis, the proposed method yields more stable inferences compared to generalized additive models with fixed exposures, while retaining interpretability.

量化短期暴露于环境空气污染与健康结果之间的关系是一项重要的公共卫生优先事项。许多研究调查了过去几天内考虑延迟效应的关联。自适应累积暴露分布滞后非线性模型(ACE-DLNMs)量化健康结果与以数据自适应方式指定的累积暴露之间的关联。虽然ACE-DLNM框架具有高度的可解释性,但它仅限于连续的结果,不能很好地扩展到大型数据集。受2001年至2018年间加拿大每日污染和呼吸住院数的大量分析的启发,我们提出了一种包含惩罚样条的广义ACE-DLNM方法,改进了现有的ACE-DLNM方法,以适应一般的反应类型。然后,我们利用牛顿型方法开发了基于轮廓似然和拉普拉斯近似边际似然的计算效率估计策略。通过仿真验证了该方法的性能和实用优势。在应用于激励分析时,与固定暴露的广义加性模型相比,该方法产生更稳定的推断,同时保持可解释性。
{"title":"Estimating associations between cumulative exposure and health via generalized distributed lag non-linear models using penalized splines.","authors":"Tianyi Pan, Hwashin Hyun Shin, Glen McGee, Alex Stringer","doi":"10.1093/biomtc/ujaf116","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf116","url":null,"abstract":"<p><p>Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specified in a data-adaptive way. While the ACE-DLNM framework is highly interpretable, it is limited to continuous outcomes and does not scale well to large datasets. Motivated by a large analysis of daily pollution and respiratory hospitalization counts in Canada between 2001 and 2018, we propose a generalized ACE-DLNM incorporating penalized splines, improving upon existing ACE-DLNM methods to accommodate general response types. We then develop a computationally efficient estimation strategy based on profile likelihood and Laplace approximate marginal likelihood with Newton-type methods. We demonstrate the performance and practical advantages of the proposed method through simulations. In application to the motivating analysis, the proposed method yields more stable inferences compared to generalized additive models with fixed exposures, while retaining interpretability.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference for copy number intra-tumoral heterogeneity from single-cell RNA-sequencing data. 单细胞rna测序数据对拷贝数肿瘤内异质性的贝叶斯推断。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf115
PuXue Qiao, Chun Fung Kwok, Guoqi Qian, Davis J McCarthy

Copy number alterations (CNA) are important drivers and markers of clonal structures within tumors. Understanding these structures at single-cell resolution is crucial to advancing cancer treatments. The objective is to cluster single cells into clones and identify CNA events in each clone. Early attempts often sacrifice the intrinsic link between cell clustering and clonal CNA detection for simplicity and rely heavily on human input for critical parameters such as the number of clones. Here, we develop a Bayesian model to utilize single-cell RNA sequencing (scRNA-seq) data for automatic analysis of intra-tumoral clonal structure concerning CNAs, without reliance on prior knowledge. The model clusters cells into sub-tumoral clones, identifies the number of clones, and simultaneously infers the clonal CNA profiles. It synergistically incorporates input from gene expression and germline single-nucleotide polymorphisms. A Gibbs sampling algorithm has been implemented and is available as an R package Chloris. We demonstrate that our new method compares strongly against existing software tools in terms of both cell clustering and CNA profile identification accuracy. Application to human metastatic melanoma and anaplastic thyroid tumor data demonstrates accurate clustering of tumor and non-tumor cells and reveals clonal CNA profiles that highlight functional gene expression differences between clones from the same tumor.

拷贝数改变(Copy number change, CNA)是肿瘤克隆结构的重要驱动因素和标志。在单细胞分辨率上理解这些结构对于推进癌症治疗至关重要。目的是将单个细胞聚类成克隆,并确定每个克隆中的CNA事件。早期的尝试往往为了简单而牺牲细胞聚类和克隆CNA检测之间的内在联系,并且严重依赖于人类输入关键参数,如克隆数量。在这里,我们开发了一个贝叶斯模型,利用单细胞RNA测序(scRNA-seq)数据自动分析肿瘤内克隆结构,而不依赖于先验知识。该模型将细胞聚集成亚肿瘤克隆,识别克隆数量,同时推断克隆CNA谱。它协同整合了基因表达和种系单核苷酸多态性的输入。吉布斯采样算法已经实现,并可作为一个R包氯气。我们证明了我们的新方法在细胞聚类和CNA轮廓识别精度方面与现有的软件工具有很强的对比。应用于人类转移性黑色素瘤和间变性甲状腺肿瘤数据证实了肿瘤和非肿瘤细胞的准确聚类,揭示了克隆CNA谱,突出了来自同一肿瘤的克隆之间功能基因表达的差异。
{"title":"Bayesian inference for copy number intra-tumoral heterogeneity from single-cell RNA-sequencing data.","authors":"PuXue Qiao, Chun Fung Kwok, Guoqi Qian, Davis J McCarthy","doi":"10.1093/biomtc/ujaf115","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf115","url":null,"abstract":"<p><p>Copy number alterations (CNA) are important drivers and markers of clonal structures within tumors. Understanding these structures at single-cell resolution is crucial to advancing cancer treatments. The objective is to cluster single cells into clones and identify CNA events in each clone. Early attempts often sacrifice the intrinsic link between cell clustering and clonal CNA detection for simplicity and rely heavily on human input for critical parameters such as the number of clones. Here, we develop a Bayesian model to utilize single-cell RNA sequencing (scRNA-seq) data for automatic analysis of intra-tumoral clonal structure concerning CNAs, without reliance on prior knowledge. The model clusters cells into sub-tumoral clones, identifies the number of clones, and simultaneously infers the clonal CNA profiles. It synergistically incorporates input from gene expression and germline single-nucleotide polymorphisms. A Gibbs sampling algorithm has been implemented and is available as an R package Chloris. We demonstrate that our new method compares strongly against existing software tools in terms of both cell clustering and CNA profile identification accuracy. Application to human metastatic melanoma and anaplastic thyroid tumor data demonstrates accurate clustering of tumor and non-tumor cells and reveals clonal CNA profiles that highlight functional gene expression differences between clones from the same tumor.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144940997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting optimal allocations for binary responses: insights from considering type-I error rate control. 重新审视二元响应的最优分配:从考虑i型错误率控制的见解。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf114
Lukas Pin, Sofía S Villar, William F Rosenberger

This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.

这项工作从i型错误率的角度重新审视了最优响应自适应设计,强调了这些分配何时以及在多大程度上加剧了i型错误率膨胀——这是一个以前没有记录的问题。我们从文献中探索了一系列可用于降低i型错误率膨胀的方法。然而,我们发现所有这些方法都不能给出问题的可靠解决方案。为了解决这个问题,我们推导了2个最优分配比例,在优化问题的公式中结合了更稳健的分数测试(而不是Wald测试)和有限样本估计器(而不是未知的真值)。一个比例优化统计能力,另一个比例最小化试验中失败的总数,同时保持固定的方差水平。通过基于早期阶段和验证性试验的模拟,我们为这些新的最佳比例设计如何在控制i型错误率的同时提供实质性的患者预后优势提供了重要的实践见解。虽然我们关注的是二元结果,但该框架提供了有价值的见解,自然可以扩展到其他结果类型、多组试验和其他感兴趣的测量方法。
{"title":"Revisiting optimal allocations for binary responses: insights from considering type-I error rate control.","authors":"Lukas Pin, Sofía S Villar, William F Rosenberger","doi":"10.1093/biomtc/ujaf114","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf114","url":null,"abstract":"<p><p>This work revisits optimal response-adaptive designs from a type-I error rate perspective, highlighting when and how much these allocations exacerbate type-I error rate inflation-an issue previously undocumented. We explore a range of approaches from the literature that can be applied to reduce type-I error rate inflation. However, we found that all of these approaches fail to give a robust solution to the problem. To address this, we derive 2 optimal allocation proportions, incorporating the more robust score test (instead of the Wald test) with finite sample estimators (instead of the unknown true values) in the formulation of the optimization problem. One proportion optimizes statistical power, and the other minimizes the total number of failures in a trial while maintaining a fixed variance level. Through simulations based on an early phase and a confirmatory trial, we provide crucial practical insight into how these new optimal proportion designs can offer substantial patient outcomes advantages while controlling type-I error rate. While we focused on binary outcomes, the framework offers valuable insights that naturally extend to other outcome types, multi-armed trials, and alternative measures of interest.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary regression and classification with covariates in metric spaces. 度量空间中带有协变量的二元回归与分类。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf123
Yinan Lin, Zhenhua Lin

Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.

受逻辑回归的启发,我们引入了一个由二进制响应和一组协变量组成的数据元组的回归模型,这些数据元组驻留在一个没有向量结构的度量空间中。基于所提出的模型,我们还开发了一个度量空间值数据的二元分类器。我们提出了模型中度量空间值回归系数的极大似然估计量,并提供了各种度量熵条件下估计误差的上界,这些条件量化了底层度量空间的复杂性。给出了统计中常见的重要度量空间的匹配下界,建立了所提估计量在这些空间中的最优性。对于黎曼流形,建立了一个更精细的上界和一个匹配的下界,从而得到了该分类器的最优性。据我们所知,所提出的回归模型和上述极大极小界是第一个用于分析一般度量空间中存在协变量的二元响应的模型。我们还通过模拟研究研究了所提出的估计器和分类器的数值性能,并通过应用于任务相关的fMRI数据说明了它们的实际优点。
{"title":"Binary regression and classification with covariates in metric spaces.","authors":"Yinan Lin, Zhenhua Lin","doi":"10.1093/biomtc/ujaf123","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf123","url":null,"abstract":"<p><p>Inspired by logistic regression, we introduce a regression model for data tuples consisting of a binary response and a set of covariates residing in a metric space without vector structures. Based on the proposed model, we also develop a binary classifier for metric-space valued data. We propose a maximum likelihood estimator for the metric-space valued regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. To the best of our knowledge, the proposed regression model and the above minimax bounds are the first of their kind for analyzing a binary response with covariates residing in general metric spaces. We also investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple tests for restricted mean time lost with competing risks data. 具有竞争风险数据的有限平均损失时间的多个测试。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf086
Merle Munko, Dennis Dobler, Marc Ditzhaus

Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.

在生存分析中,易于解释的效果估计是非常可取的。在竞争风险框架中,一个很好的候选是受限平均损失时间(RMTL)。它被定义为截止到预定时间点的累积关联函数下的面积,从而将累积关联函数总结为一个有意义的估计。虽然现有的基于rmtl的测试仅限于2个样本比较,而且主要是2个事件类型,但我们的目标是开发基于wald型检验统计量的析因设计和任意数量的事件类型的通用对比测试。此外,我们避免了对事件时间分布经常作出的相当严格的连续性假设。这允许在数据中出现关联,这在实际应用中经常出现,例如,当以全天为单位测量事件时间时。此外,我们开发了基于排列方法的RMTL比较更可靠的测试,以提高小样本性能。在第二步中,开发RMTL比较的多个检验来同时检验几个零假设。在这里,我们结合了局部检验统计量之间的渐近精确依赖结构来获得更大的功率。通过模拟分析了所提出的测试方法的小样本性能,最后通过对白血病患者进行骨髓移植的实际数据示例进行了分析。
{"title":"Multiple tests for restricted mean time lost with competing risks data.","authors":"Merle Munko, Dennis Dobler, Marc Ditzhaus","doi":"10.1093/biomtc/ujaf086","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf086","url":null,"abstract":"<p><p>Easy-to-interpret effect estimands are highly desirable in survival analysis. In the competing risks framework, one good candidate is the restricted mean time lost (RMTL). It is defined as the area under the cumulative incidence function up to a prespecified time point and, thus, it summarizes the cumulative incidence function into a meaningful estimand. While existing RMTL-based tests are limited to 2-sample comparisons and mostly to 2 event types, we aim to develop general contrast tests for factorial designs and an arbitrary number of event types based on a Wald-type test statistic. Furthermore, we avoid the often-made, rather restrictive continuity assumption on the event time distribution. This allows for ties in the data, which often occur in practical applications, for example, when event times are measured in whole days. In addition, we develop more reliable tests for RMTL comparisons that are based on a permutation approach to improve the small sample performance. In a second step, multiple tests for RMTL comparisons are developed to test several null hypotheses simultaneously. Here, we incorporate the asymptotically exact dependence structure between the local test statistics to gain more power. The small sample performance of the proposed testing procedures is analyzed in simulations and finally illustrated by analyzing a real-data example about leukemia patients who underwent bone marrow transplantation.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-stage estimators for spatial confounding with point-referenced data. 点参考数据空间混淆的两阶段估计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf093
Nate Wiecha, Jane A Hoppin, Brian J Reich

Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.

公共卫生数据通常具有空间依赖性,但当自变量与空间相关残差相关联时,标准空间回归方法可能存在偏差和无效推断。例如,如果存在与空间回归分析中的独立变量和结果变量相关的未测量环境污染物,则可能发生这种情况。Geoadditive structural equation modeling (gSEM),即在估计感兴趣的参数之前,将估计的空间趋势从解释变量和响应变量中去除,已经被提出作为一种解决方案,但很少有研究利用点参考数据来研究gSEM的特性。我们将gSEM与基于两阶段过程的双机器学习和半参数回归的结果联系起来。我们提出将这些半参数估计量用于空间回归,利用具有mat协方差的高斯过程来估计空间趋势,并将这类估计量命名为双空间回归(DSR)。我们推导了根n渐近正态性、一致性和封闭式方差估计的正则性条件,并表明在标准空间回归估计高度偏倚和覆盖率低的模拟中,DSR可以比竞争对手更有效地减轻偏倚并获得名义覆盖率。
{"title":"Two-stage estimators for spatial confounding with point-referenced data.","authors":"Nate Wiecha, Jane A Hoppin, Brian J Reich","doi":"10.1093/biomtc/ujaf093","DOIUrl":"10.1093/biomtc/ujaf093","url":null,"abstract":"<p><p>Public health data are often spatially dependent, but standard spatial regression methods can suffer from bias and invalid inference when the independent variable is associated with spatially correlated residuals. This could occur if, for example, there is an unmeasured environmental contaminant associated with the independent and outcome variables in a spatial regression analysis. Geoadditive structural equation modeling (gSEM), in which an estimated spatial trend is removed from both the explanatory and response variables before estimating the parameters of interest, has previously been proposed as a solution but there has been little investigation of gSEM's properties with point-referenced data. We link gSEM to results on double machine learning and semiparametric regression based on two-stage procedures. We propose using these semiparametric estimators for spatial regression using Gaussian processes with Matèrn covariance to estimate the spatial trends and term this class of estimators double spatial regression (DSR). We derive regularity conditions for root-n asymptotic normality and consistency and closed-form variance estimation, and show that in simulations where standard spatial regression estimators are highly biased and have poor coverage, DSR can mitigate bias more effectively than competitors and obtain nominal coverage.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric joint modeling to estimate the treatment effect on a longitudinal surrogate with application to chronic kidney disease trials. 半参数联合建模用于估计慢性肾脏疾病试验中纵向替代物的治疗效果。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-07-03 DOI: 10.1093/biomtc/ujaf104
Xuan Wang, Jie Zhou, Layla Parast, Tom Greene

In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.

在需要长时间随访来衡量主要结局的临床试验中,人们对使用可接受的替代结局有很大的兴趣,这种结局可以更早地测量或成本更低,以估计治疗效果。例如,在慢性肾脏疾病的临床试验中,治疗效果通常通过纵向替代指标、每年纵向结局(肾小球滤过率,GFR)的变化或GFR斜率来证明。然而,估计治疗对GFR斜率的影响是复杂的,因为GFR测量可能因发生终端事件(如死亡或肾衰竭)而终止。因此,要估计这种影响,必须同时考虑GFR的纵向轨迹和终端事件过程。本文构建了纵向结果与终端事件联合建模的半参数框架,其中纵向结果模型为半参数模型,纵向结果与终端事件之间的关系为非参数模型,终端事件通过半参数Cox模型建模。所提出的半参数关节模型是灵活的,可以很容易地扩展到包括纵向结果的非线性轨迹。提出了一种基于估计方程的方法来估计治疗效果对纵向替代结果(如GFR斜率)的影响。推导了所提估计器的理论性质,并通过仿真研究评估了有限样本的性能。我们使用血管紧张素II拮抗剂氯沙坦(RENAAL)试验中减少NIDDM终点的数据来说明所提出的方法,以检查氯沙坦对GFR斜率的影响。
{"title":"Semiparametric joint modeling to estimate the treatment effect on a longitudinal surrogate with application to chronic kidney disease trials.","authors":"Xuan Wang, Jie Zhou, Layla Parast, Tom Greene","doi":"10.1093/biomtc/ujaf104","DOIUrl":"10.1093/biomtc/ujaf104","url":null,"abstract":"<p><p>In clinical trials where long follow-up is required to measure the primary outcome of interest, there is substantial interest in using an accepted surrogate outcome that can be measured earlier in time or with less cost to estimate a treatment effect. For example, in clinical trials of chronic kidney disease, the effect of a treatment is often demonstrated on a longitudinal surrogate, the change of the longitudinal outcome (glomerular filtration rate, GFR) per year or GFR slope. However, estimating the effect of a treatment on the GFR slope is complicated by the fact that GFR measurement can be terminated by the occurrence of a terminal event, such as death or kidney failure. Thus, to estimate this effect, one must consider both the longitudinal GFR trajectory and the terminal event process. In this paper, we build a semiparametric framework to jointly model the longitudinal outcome and the terminal event, where the model for the longitudinal outcome is semiparametric, the relationship between the longitudinal outcome and the terminal event is nonparametric, and the terminal event is modeled via a semiparametric Cox model. The proposed semiparametric joint model is flexible and can be easily extended to include a nonlinear trajectory of the longitudinal outcome. An estimating equation based method is proposed to estimate the treatment effect on the longitudinal surrogate outcome (eg, GFR slope). Theoretical properties of the proposed estimators are derived, and finite sample performance is evaluated through simulation studies. We illustrate the proposed method using data from the Reduction of Endpoints in NIDDM with the Angiotensin II Antagonist Losartan (RENAAL) trial to examine the effect of Losartan on GFR slope.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1