首页 > 最新文献

Biometrics最新文献

英文 中文
Letter to the Editors: Comments on "Statistical inference on change points in generalized semiparametric segmented models" by Yang et al. (2025). 致编辑的信:对Yang等人(2025)的“广义半参数分段模型中变化点的统计推断”的评论。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf147
Vito M R Muggeo

We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.

我们对Yang等人最近发表的关于分段回归中模型估计和假设检验的论文提供了一些评论。
{"title":"Letter to the Editors: Comments on \"Statistical inference on change points in generalized semiparametric segmented models\" by Yang et al. (2025).","authors":"Vito M R Muggeo","doi":"10.1093/biomtc/ujaf147","DOIUrl":"10.1093/biomtc/ujaf147","url":null,"abstract":"<p><p>We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145666830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible and efficient estimation of causal effects with error-prone exposures: a control variates approach for measurement error. 易出错暴露的灵活有效的因果效应估计:测量误差的控制变量方法。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf151
Keith Barnatchez, Rachel Nethery, Bryan E Shepherd, Giovanni Parmigiani, Kevin P Josey

Exposure measurement error is a ubiquitous but often overlooked challenge in causal inference with observational data. Existing methods accounting for exposure measurement error largely rely on restrictive parametric assumptions, while emerging data-adaptive estimation approaches allow for less restrictive assumptions but at the cost of flexibility, as they are typically tailored toward rigidly defined statistical quantities. There remains a critical need for assumption-lean estimation methods that are both flexible and possess desirable theoretical properties across a variety of study designs. In this paper, we introduce a general framework for estimation of causal quantities in the presence of exposure measurement error, adapted from the method of control variates. Our method can be implemented in various two-phase sampling study designs, where one obtains gold-standard exposure measurements for a small subset of the full study sample, called the validation data. The control variates framework leverages both the error-prone and error-free exposure measurements by augmenting an initial consistent estimator from the validation data with a variance reduction term formed from the full data. We show that our method inherits double-robustness properties under standard causal assumptions. Simulation studies show that our approach performs favorably compared to leading methods under various two-phase sampling schemes. We illustrate our method with observational electronic health record data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.

在利用观测数据进行因果推理时,暴露测量误差是一个普遍存在但经常被忽视的挑战。考虑暴露测量误差的现有方法在很大程度上依赖于限制性的参数假设,而新兴的数据自适应估计方法允许限制性较少的假设,但以灵活性为代价,因为它们通常是针对严格定义的统计量量身定制的。仍然迫切需要假设精益的估计方法,这些方法既灵活又在各种研究设计中具有理想的理论性质。在本文中,我们从控制变量的方法中引入了在存在暴露测量误差的情况下估计因果量的一般框架。我们的方法可以在不同的两阶段抽样研究设计中实现,其中一个获得完整研究样本的一小部分的金标准暴露测量,称为验证数据。控制变量框架通过使用由完整数据形成的方差缩减项来增加来自验证数据的初始一致估计器,从而利用易出错和无错误的暴露度量。我们证明了我们的方法在标准因果假设下继承了双鲁棒性。仿真研究表明,在各种两相采样方案下,我们的方法比现有的方法具有更好的性能。我们用范德比尔特综合护理诊所艾滋病毒结果的观察性电子健康记录数据来说明我们的方法。
{"title":"Flexible and efficient estimation of causal effects with error-prone exposures: a control variates approach for measurement error.","authors":"Keith Barnatchez, Rachel Nethery, Bryan E Shepherd, Giovanni Parmigiani, Kevin P Josey","doi":"10.1093/biomtc/ujaf151","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf151","url":null,"abstract":"<p><p>Exposure measurement error is a ubiquitous but often overlooked challenge in causal inference with observational data. Existing methods accounting for exposure measurement error largely rely on restrictive parametric assumptions, while emerging data-adaptive estimation approaches allow for less restrictive assumptions but at the cost of flexibility, as they are typically tailored toward rigidly defined statistical quantities. There remains a critical need for assumption-lean estimation methods that are both flexible and possess desirable theoretical properties across a variety of study designs. In this paper, we introduce a general framework for estimation of causal quantities in the presence of exposure measurement error, adapted from the method of control variates. Our method can be implemented in various two-phase sampling study designs, where one obtains gold-standard exposure measurements for a small subset of the full study sample, called the validation data. The control variates framework leverages both the error-prone and error-free exposure measurements by augmenting an initial consistent estimator from the validation data with a variance reduction term formed from the full data. We show that our method inherits double-robustness properties under standard causal assumptions. Simulation studies show that our approach performs favorably compared to leading methods under various two-phase sampling schemes. We illustrate our method with observational electronic health record data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145653462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian monotone single-index quantile regression model with bounded response and misaligned functional covariates. 具有有界响应和失调协变量的贝叶斯单调单指标分位数回归模型。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf145
Shengxian Ding, Debajyoti Sinha, Greg Hajcak, Roman Kotov, Chao Huang

Existing research in mental health has established that rising depressive symptoms in adolescents are associated with parental history of depression and other behavioral risk factors. Our goal is to investigate how these scalar variables, together with multiple functional covariates capturing neural responses to rewards, relate to future adolescent depression. Departing from prior studies that typically relied on simple linear regression to model all covariates, we propose a novel Bayesian quantile regression framework. This approach constructs a single-index summary of both scalar and functional covariates, coupled with a monotone link function that flexibly captures unknown nonlinear relationships and interactions. Our method addresses several limitations of existing approaches. It offers a clinically interpretable index akin to that of linear models, ensures that the estimated quantile remains within the response bounds, and jointly incorporates the registration of functional covariates within the quantile regression analysis. Our simulation studies demonstrate that our method outperforms existing unrestricted single-index-based methods, particularly when there are both scalar and preregistered functional covariates. Furthermore, we showcase the practical utility of our framework using data from a large-scale adolescent depression study, yielding a new, statistically principled summary of neural reward processing with direct relevance to future depression risk.

现有的心理健康研究已经确定,青少年抑郁症状的上升与父母的抑郁史和其他行为风险因素有关。我们的目标是研究这些标量变量,以及捕获对奖励的神经反应的多个功能协变量,如何与未来的青少年抑郁症相关。从以往的研究通常依赖于简单的线性回归来建模所有协变量,我们提出了一个新的贝叶斯分位数回归框架。该方法构建了标量协变量和泛函协变量的单索引摘要,并结合了一个灵活捕获未知非线性关系和相互作用的单调链接函数。我们的方法解决了现有方法的几个局限性。它提供了一个类似于线性模型的临床可解释指标,确保估计的分位数保持在响应范围内,并在分位数回归分析中联合纳入了功能协变量的注册。我们的模拟研究表明,我们的方法优于现有的不受限制的基于单索引的方法,特别是当同时存在标量和预注册的函数协变量时。此外,我们利用一项大规模青少年抑郁症研究的数据,展示了我们的框架的实际效用,得出了与未来抑郁风险直接相关的神经奖励处理的新统计原则总结。
{"title":"Bayesian monotone single-index quantile regression model with bounded response and misaligned functional covariates.","authors":"Shengxian Ding, Debajyoti Sinha, Greg Hajcak, Roman Kotov, Chao Huang","doi":"10.1093/biomtc/ujaf145","DOIUrl":"10.1093/biomtc/ujaf145","url":null,"abstract":"<p><p>Existing research in mental health has established that rising depressive symptoms in adolescents are associated with parental history of depression and other behavioral risk factors. Our goal is to investigate how these scalar variables, together with multiple functional covariates capturing neural responses to rewards, relate to future adolescent depression. Departing from prior studies that typically relied on simple linear regression to model all covariates, we propose a novel Bayesian quantile regression framework. This approach constructs a single-index summary of both scalar and functional covariates, coupled with a monotone link function that flexibly captures unknown nonlinear relationships and interactions. Our method addresses several limitations of existing approaches. It offers a clinically interpretable index akin to that of linear models, ensures that the estimated quantile remains within the response bounds, and jointly incorporates the registration of functional covariates within the quantile regression analysis. Our simulation studies demonstrate that our method outperforms existing unrestricted single-index-based methods, particularly when there are both scalar and preregistered functional covariates. Furthermore, we showcase the practical utility of our framework using data from a large-scale adolescent depression study, yielding a new, statistically principled summary of neural reward processing with direct relevance to future depression risk.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12569519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145385616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inverse-intensity weighted generalized estimating equations for longitudinal data subject to irregular observation: which variables should be included in the visit rate model? 不规则观测下纵向数据的逆强度加权广义估计方程:访问率模型中应包括哪些变量?
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf128
Eleanor M Pullenayegum, Di Shan

Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.

纵向数据往往受到不规则和信息访问时间的影响。在给定访问模型的协变量时,如果结果和访问时间是条件独立的,则通过访问率的逆加权广义估计方程可以得到回归系数的渐近无偏估计。在保持条件独立性的情况下,加入其他协变量对估计回归系数的渐近偏差没有影响,但对其方差的影响是未知的。结果表明,添加与结果和访问过程无关的变量时,方差不变;添加与结果和访问过程无关的变量时,方差减小。根据协变量和结果的相关结构,添加与就诊相关但不与结果相关的变量可能会增加或减少估计结果模型回归系数的方差。应用于重度抑郁症的研究发现,当结果预测因子而不是访问时,估计回归系数的方差与访问率模型相似,但在某些情况下,在添加访问预测因子而不是结果时,估计回归系数的方差始终较大,在某些情况下增加了2倍。我们建议访问过程模型包括与结果相关的变量,但对那些与结果无关的变量要谨慎对待。
{"title":"Inverse-intensity weighted generalized estimating equations for longitudinal data subject to irregular observation: which variables should be included in the visit rate model?","authors":"Eleanor M Pullenayegum, Di Shan","doi":"10.1093/biomtc/ujaf128","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf128","url":null,"abstract":"<p><p>Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive stratified sampling design in two-phase studies for average causal effect estimation. 平均因果效应估计的两阶段研究中的自适应分层抽样设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf143
Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu

Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.

利用观测数据进行因果推断通常会受到许多混杂效应的影响,如果忽略这些混杂效应,平均因果效应(ACE)估计会受到极大的扭曲。一些混杂因素的信息,如遗传生物标志物和医学成像,在实践中难以获得。两阶段研究是解决这一问题的有效方法。在这些研究中,在第一阶段对大量受试者的结果、治疗和廉价混杂因素进行了测量;然后在第二阶段为有限数量的受试者收集昂贵的混杂测量。有效的统计设计对于控制第二阶段产生的费用至关重要。在本文中,我们提出了一种自适应分层抽样设计(AdaStrat),它使ACE估计器在给定的第二阶段样本量下的方差最小化。AdaStrat首先为随机选择的试验数据收集昂贵的混杂测量,这些数据用于制定分层策略并确定地层的采样概率。由此产生的分层和抽样策略应用于所有第一阶段受试者,以确定第二阶段受试者与昂贵的混杂措施。我们严格证明,与现有的带地层前缀的采样设计相比,AdaStrat产生了更有效的ACE估计器。通过模拟研究评估了AdaStrat的有限样本特性,证明了它相对于固定分层抽样设计(FixStrat)的优势,在我们的模拟情况下,相对效率在20%到30%之间。通过英国生物银行数据的应用,进一步确认了AdaStrat所需的有限样本特性。
{"title":"Adaptive stratified sampling design in two-phase studies for average causal effect estimation.","authors":"Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu","doi":"10.1093/biomtc/ujaf143","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf143","url":null,"abstract":"<p><p>Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable importance measures for heterogeneous treatment effects. 异质性处理效果的不同重要性度量。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf140
Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt

Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.

受精准医疗和治疗效果异质性应用的影响,最近的研究集中在使用机器学习(ML)估计条件平均治疗效果(CATEs)上。CATE估计可能代表了复杂的功能,对异质性的关键驱动因素提供了很少的见解。因此,我们引入了基于均方误差(MSE)的非参数治疗效果变量重要性度量(TE-VIMs)来预测个体治疗效果。更准确地说,TE-VIMs表示从CATE条件集中删除变量时MSE的增加。我们得到了有效的TE-VIM估计器,它可以与任何CATE估计策略一起使用,并且适用于ML估计。我们提出了几种策略来计算这些VIMs(例如,省略一个,或保留一个),使用流行的元学习器进行CATE。我们通过模拟研究研究了有限样本的性能,并通过临床试验数据说明了它们的应用。
{"title":"Variable importance measures for heterogeneous treatment effects.","authors":"Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf140","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf140","url":null,"abstract":"<p><p>Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Bayesian additive regression trees for multiple nonlinear dependency networks. 多非线性依赖网络的联合贝叶斯加性回归树。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf158
Licai Huang, Christine B Peterson, Min Jin Ha

Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.

鉴定蛋白质-蛋白质相互作用网络可以揭示癌症的治疗靶点;然而,对于异质性癌症,如结直肠癌(CRC),对整个数据集的汇总分析可能会遗漏亚型特异性机制,而对每个亚组数据的单独分析可能会降低识别共享关系的能力。为了解决这一限制,我们提出了一个分层贝叶斯模型来推断依赖网络,该模型鼓励跨子组的共同选择边,同时允许特定于子类型的连接。为了考虑非线性依赖关系,我们依靠贝叶斯加性回归树(BART)来表征每个子组的关键机制。因为BART是一个允许非线性效应和相互作用的灵活模型,它比假设线性的经典模型更适合基因组数据。为了连接子组,我们将马尔可夫随机场置于分裂规则中使用特征的概率之上;这允许我们在确定共享依赖关系时借用跨子群体的力量。我们使用模拟数据和实际数据应用来说明该模型,用于估计跨CRC亚型的蛋白质-蛋白质相互作用网络。
{"title":"Joint Bayesian additive regression trees for multiple nonlinear dependency networks.","authors":"Licai Huang, Christine B Peterson, Min Jin Ha","doi":"10.1093/biomtc/ujaf158","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf158","url":null,"abstract":"<p><p>Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145740907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clarifying the role of the Mantel-Haenszel risk difference estimator in randomized clinical trials. 阐明Mantel-Haenszel风险差异估计器在随机临床试验中的作用。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf142
Xiaoyu Qiu, Yuhan Qian, Jaehwan Yi, Jinqiu Wang, Yu Du, Yanyao Yi, Ting Ye

The Mantel-Haenszel (MH) risk difference estimator, commonly used in randomized clinical trials for binary outcomes, calculates a weighted average of stratum-specific risk difference estimators. Traditionally, this method requires the stringent assumption that risk differences are homogeneous across strata, also known as the common (constant) risk difference assumption. In our paper, we relax this assumption and adopt a modern perspective, viewing the MH risk difference estimator as an approach for covariate adjustment in randomized clinical trials, distinguishing its use from that in meta-analysis and observational studies. We demonstrate that, under reasonable restrictions on risk difference variability, the MH risk difference estimator consistently estimates the average treatment effect within a standard super-population framework, which is often the primary interest in randomized clinical trials, in addition to estimating a weighted average of stratum-specific risk differences. We rigorously study its properties under the large-stratum and sparse-stratum asymptotic regimes, as well as under mixed-regime settings. Furthermore, for either estimand, we propose a unified robust variance estimator that improves over the popular variance estimators by Greenland and Robins and Sato et al. and has provable consistency across these asymptotic regimes, regardless of assuming common risk differences. Extensions of our theoretical results also provide new insights into the MH test, the post-stratification estimator, and settings with multiple treatments. Our findings are thoroughly validated through simulations and a clinical trial example.

Mantel-Haenszel (MH)风险差异估计值通常用于随机临床试验的二元结果,计算层特异性风险差异估计值的加权平均值。传统上,该方法要求严格假设风险差在各层之间是均匀的,也称为公共(恒定)风险差假设。在我们的论文中,我们放宽了这一假设,采用现代的观点,将MH风险差异估计器视为随机临床试验中协变量调整的一种方法,将其与荟萃分析和观察性研究中的使用区别开来。我们证明,在对风险差异可变性的合理限制下,MH风险差异估计器始终在标准的超人群框架内估计平均治疗效果,这通常是随机临床试验的主要兴趣,此外还估计了层特异性风险差异的加权平均值。我们严格地研究了它在大地层和稀疏地层渐近状态下以及混合状态下的性质。此外,对于这两种估计,我们提出了一个统一的稳健方差估计量,它比Greenland、Robins和Sato等人的流行方差估计量有所改进,并且在这些渐近状态下具有可证明的一致性,而不管假设共同风险差异。我们的理论结果的扩展也为MH检验、分层后估计器和多种治疗设置提供了新的见解。我们的研究结果通过模拟和临床试验实例得到了彻底的验证。
{"title":"Clarifying the role of the Mantel-Haenszel risk difference estimator in randomized clinical trials.","authors":"Xiaoyu Qiu, Yuhan Qian, Jaehwan Yi, Jinqiu Wang, Yu Du, Yanyao Yi, Ting Ye","doi":"10.1093/biomtc/ujaf142","DOIUrl":"10.1093/biomtc/ujaf142","url":null,"abstract":"<p><p>The Mantel-Haenszel (MH) risk difference estimator, commonly used in randomized clinical trials for binary outcomes, calculates a weighted average of stratum-specific risk difference estimators. Traditionally, this method requires the stringent assumption that risk differences are homogeneous across strata, also known as the common (constant) risk difference assumption. In our paper, we relax this assumption and adopt a modern perspective, viewing the MH risk difference estimator as an approach for covariate adjustment in randomized clinical trials, distinguishing its use from that in meta-analysis and observational studies. We demonstrate that, under reasonable restrictions on risk difference variability, the MH risk difference estimator consistently estimates the average treatment effect within a standard super-population framework, which is often the primary interest in randomized clinical trials, in addition to estimating a weighted average of stratum-specific risk differences. We rigorously study its properties under the large-stratum and sparse-stratum asymptotic regimes, as well as under mixed-regime settings. Furthermore, for either estimand, we propose a unified robust variance estimator that improves over the popular variance estimators by Greenland and Robins and Sato et al. and has provable consistency across these asymptotic regimes, regardless of assuming common risk differences. Extensions of our theoretical results also provide new insights into the MH test, the post-stratification estimator, and settings with multiple treatments. Our findings are thoroughly validated through simulations and a clinical trial example.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12576803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145420927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials. 远端因果偏移效应:微随机试验中时变治疗的长期效应建模。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf134
Tianchen Qian

Micro-randomized trials (MRTs) play a crucial role in optimizing digital interventions. In an MRT, each participant is sequentially randomized among treatment options hundreds of times. While the interventions tested in MRTs target short-term behavioral responses (proximal outcomes), their ultimate goal is to drive long-term behavior change (distal outcomes). However, existing causal inference methods, such as the causal excursion effect, are limited to proximal outcomes, making it challenging to quantify the long-term impact of interventions. To address this gap, we introduce the distal causal excursion effect (DCEE), a novel estimand that quantifies the long-term effect of time-varying treatments. The DCEE contrasts distal outcomes under two excursion policies while marginalizing over most treatment assignments, enabling a parsimonious and interpretable causal model even with a large number of decision points. We propose two estimators for the DCEE-one with cross-fitting and one without-both robust to misspecification of the outcome model. We establish their asymptotic properties and validate their performance through simulations. We apply our method to the HeartSteps MRT to assess the impact of activity prompts on long-term habit formation. Our findings suggest that prompts delivered earlier in the study have a stronger long-term effect than those delivered later, underscoring the importance of intervention timing in behavior change. This work provides the critically needed toolkit for scientists working on digital interventions to assess long-term causal effects using MRT data.

微随机试验(MRTs)在优化数字干预方面发挥着至关重要的作用。在MRT中,每个参与者按顺序随机选择治疗方案数百次。虽然在mrt中测试的干预措施针对的是短期行为反应(近端结果),但它们的最终目标是推动长期行为改变(远端结果)。然而,现有的因果推理方法,如因果偏移效应,仅限于近端结果,这使得量化干预措施的长期影响具有挑战性。为了解决这一差距,我们引入了远端因果偏移效应(DCEE),这是一种量化时变治疗长期效果的新估计。DCEE对比了两种偏移政策下的远端结果,同时边缘化了大多数治疗分配,即使有大量决策点,也能实现简洁且可解释的因果模型。我们为dcee提出了两个估计器-一个具有交叉拟合,一个没有-两者都对结果模型的错误规范具有鲁棒性。我们建立了它们的渐近性质,并通过仿真验证了它们的性能。我们将我们的方法应用于HeartSteps MRT,以评估活动提示对长期习惯形成的影响。我们的研究结果表明,在研究中较早提供的提示比较晚提供的提示具有更强的长期效果,强调了干预时间在行为改变中的重要性。这项工作为从事数字干预的科学家提供了急需的工具包,以利用MRT数据评估长期因果关系。
{"title":"Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials.","authors":"Tianchen Qian","doi":"10.1093/biomtc/ujaf134","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf134","url":null,"abstract":"<p><p>Micro-randomized trials (MRTs) play a crucial role in optimizing digital interventions. In an MRT, each participant is sequentially randomized among treatment options hundreds of times. While the interventions tested in MRTs target short-term behavioral responses (proximal outcomes), their ultimate goal is to drive long-term behavior change (distal outcomes). However, existing causal inference methods, such as the causal excursion effect, are limited to proximal outcomes, making it challenging to quantify the long-term impact of interventions. To address this gap, we introduce the distal causal excursion effect (DCEE), a novel estimand that quantifies the long-term effect of time-varying treatments. The DCEE contrasts distal outcomes under two excursion policies while marginalizing over most treatment assignments, enabling a parsimonious and interpretable causal model even with a large number of decision points. We propose two estimators for the DCEE-one with cross-fitting and one without-both robust to misspecification of the outcome model. We establish their asymptotic properties and validate their performance through simulations. We apply our method to the HeartSteps MRT to assess the impact of activity prompts on long-term habit formation. Our findings suggest that prompts delivered earlier in the study have a stronger long-term effect than those delivered later, underscoring the importance of intervention timing in behavior change. This work provides the critically needed toolkit for scientists working on digital interventions to assess long-term causal effects using MRT data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145298424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entrywise splitting cross-validation in generalized factor models: from sample splitting to entrywise splitting. 广义因子模型的入项分裂交叉验证:从样本分裂到入项分裂。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf153
Zhijing Wang

The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.

广义因子模型已被广泛用于各种类型的多变量数据的降维,包括二元选择、计数和连续观测。虽然确定这些模型中的因素数量已经受到了重要的学术关注,但它仍然是该领域的一个公开挑战。在本文中,我们提出了一种基于入口分裂(ES)而不是样本分裂的交叉验证(CV)方法来解决这个问题。与传统的交叉验证类似,这种方法主要防止了对因素数量的低估。然后,我们引入了一个受惩罚的入口分裂交叉验证准则,该准则通过添加惩罚项将原始CV与信息论准则相结合。它的一致性是在温和的条件下建立的,在高维环境中,样本大小和特征数量都增长到无穷大。此外,我们将我们的方法扩展到具有不同概率情景的随机丢失数据。我们通过综合模拟评估了所提出方法的性能,并将其应用于小鼠大脑单细胞RNA测序数据集。
{"title":"Entrywise splitting cross-validation in generalized factor models: from sample splitting to entrywise splitting.","authors":"Zhijing Wang","doi":"10.1093/biomtc/ujaf153","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf153","url":null,"abstract":"<p><p>The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1