首页 > 最新文献

Biometrics最新文献

英文 中文
Maximized sequential probability ratio test regression. 最大化序列概率比检验回归。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf170
Ivair R Silva, Joselito Montalban, Fernando L P de Oliveira

Ideally, the sequential monitoring of adverse events following post-licensed drugs and vaccines is correctly adjusted for confounding variables, such as gender and age, that may have an effect on the quality of the events. This is the idea behind the usual fully randomized, the placebo-control, and the self-control designs. Two prominent methods for conducting sequential analysis of the safety of post-market drugs and vaccines are the maximized sequential probability ratio test (MaxSPRT), and its conditional version, the CMaxSPRT. However, even when the assumption of sample homogeneity is realistic prior to the drug/vaccine administration, the effects caused by the drugs and vaccines on the risk of an adverse event, if any, can still vary according to observable covariates. For binomial and Poisson data, a straightforward sequential test method is introduced in order to accommodate a regression structure in the MaxSPRT. The proposed sequential regression test is also applicable for the CMaxSPRT, that is, the regression works for comparing historical and surveillance Poisson data with unknown heterogeneous baseline rates, taking into account seasonality and any other observable confounding covariates. To illustrate the usefulness of such a regression method, we describe the potential applications of the method to monitor vaccine-adverse events in Manitoba, Canada. The numeric results and examples were executed with the R Sequential package.

理想情况下,对获得许可的药物和疫苗后不良事件的序贯监测应根据可能影响事件质量的混杂变量(如性别和年龄)进行正确调整。这就是通常的完全随机、安慰剂对照和自我控制设计背后的思想。对上市后药品和疫苗安全性进行序贯分析的两种主要方法是最大序贯概率比检验(MaxSPRT)及其有条件版本CMaxSPRT。然而,即使在使用药物/疫苗之前假设样本均匀性是现实的,药物和疫苗对不良事件风险的影响(如果有的话)仍然可以根据可观察到的协变量而变化。对于二项和泊松数据,为了适应MaxSPRT中的回归结构,引入了一种直接的顺序测试方法。所提出的顺序回归检验也适用于CMaxSPRT,也就是说,考虑到季节性和任何其他可观察到的混杂协变量,回归适用于比较具有未知异构基线率的历史和监测泊松数据。为了说明这种回归方法的有用性,我们描述了该方法在加拿大马尼托巴省监测疫苗不良事件的潜在应用。数值计算结果和算例用R序贯程序包执行。
{"title":"Maximized sequential probability ratio test regression.","authors":"Ivair R Silva, Joselito Montalban, Fernando L P de Oliveira","doi":"10.1093/biomtc/ujaf170","DOIUrl":"10.1093/biomtc/ujaf170","url":null,"abstract":"<p><p>Ideally, the sequential monitoring of adverse events following post-licensed drugs and vaccines is correctly adjusted for confounding variables, such as gender and age, that may have an effect on the quality of the events. This is the idea behind the usual fully randomized, the placebo-control, and the self-control designs. Two prominent methods for conducting sequential analysis of the safety of post-market drugs and vaccines are the maximized sequential probability ratio test (MaxSPRT), and its conditional version, the CMaxSPRT. However, even when the assumption of sample homogeneity is realistic prior to the drug/vaccine administration, the effects caused by the drugs and vaccines on the risk of an adverse event, if any, can still vary according to observable covariates. For binomial and Poisson data, a straightforward sequential test method is introduced in order to accommodate a regression structure in the MaxSPRT. The proposed sequential regression test is also applicable for the CMaxSPRT, that is, the regression works for comparing historical and surveillance Poisson data with unknown heterogeneous baseline rates, taking into account seasonality and any other observable confounding covariates. To illustrate the usefulness of such a regression method, we describe the potential applications of the method to monitor vaccine-adverse events in Manitoba, Canada. The numeric results and examples were executed with the R Sequential package.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12745959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Censoring-robust estimation in fixed sample time-to-event clinical trials with adaptive randomization. 具有自适应随机化的固定样本时间-事件临床试验的审查稳健估计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf161
Navneet R Hakhu, Daniel L Gillen

Adaptive randomization is a clinical trial design feature used to modify treatment allocation probabilities during accrual. In time-to-event trials, the impact of adaptive randomization is less well understood for estimating treatment efficacy in the presence of time-varying effects [e.g., relative risk of progression to acquired immunodeficiency syndrome (AIDS) or death changes over time]. Here, we focus on time-to-event trials where the scientific estimand is a marginal hazard ratio in the absence of intermittent censoring over the support of observed times. We analytically show that adaptive randomization alters censoring patterns and illustrate via Monte Carlo simulations that the Cox proportional hazards estimator can yield biased estimates. As a remedy, we propose a censoring-robust estimator based on reweighting the partial likelihood score by treatment-specific censoring distributions that account for adaptive randomization. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample operating characteristics via simulation. Finally, we apply our proposed method using data from the Community Programs for Clinical Research on AIDS Trial 002.

自适应随机化是一种临床试验设计特征,用于在累积过程中修改治疗分配概率。在时间事件试验中,在存在时变效应(例如,进展为获得性免疫缺陷综合征(艾滋病)或死亡的相对风险随时间变化)的情况下,适应性随机化对估计治疗效果的影响了解较少。在这里,我们将重点放在时间事件试验上,其中科学估计是在观察时间支持下缺乏间歇性审查的边际风险比。我们分析表明,自适应随机化改变了审查模式,并通过蒙特卡罗模拟说明,Cox比例风险估计器可以产生有偏估计。作为补救措施,我们提出了一种基于重新加权部分似然评分的审查鲁棒估计器,该估计器是通过考虑自适应随机化的治疗特异性审查分布来实现的。我们推导了所提估计量的渐近性质,并通过仿真评估了它的有限样本工作特性。最后,我们使用来自艾滋病临床研究社区项目002试验的数据应用我们提出的方法。
{"title":"Censoring-robust estimation in fixed sample time-to-event clinical trials with adaptive randomization.","authors":"Navneet R Hakhu, Daniel L Gillen","doi":"10.1093/biomtc/ujaf161","DOIUrl":"10.1093/biomtc/ujaf161","url":null,"abstract":"<p><p>Adaptive randomization is a clinical trial design feature used to modify treatment allocation probabilities during accrual. In time-to-event trials, the impact of adaptive randomization is less well understood for estimating treatment efficacy in the presence of time-varying effects [e.g., relative risk of progression to acquired immunodeficiency syndrome (AIDS) or death changes over time]. Here, we focus on time-to-event trials where the scientific estimand is a marginal hazard ratio in the absence of intermittent censoring over the support of observed times. We analytically show that adaptive randomization alters censoring patterns and illustrate via Monte Carlo simulations that the Cox proportional hazards estimator can yield biased estimates. As a remedy, we propose a censoring-robust estimator based on reweighting the partial likelihood score by treatment-specific censoring distributions that account for adaptive randomization. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample operating characteristics via simulation. Finally, we apply our proposed method using data from the Community Programs for Clinical Research on AIDS Trial 002.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Letter to the Editors: Comments on "Statistical inference on change points in generalized semiparametric segmented models" by Yang et al. (2025). 致编辑的信:对Yang等人(2025)的“广义半参数分段模型中变化点的统计推断”的评论。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf147
Vito M R Muggeo

We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.

我们对Yang等人最近发表的关于分段回归中模型估计和假设检验的论文提供了一些评论。
{"title":"Letter to the Editors: Comments on \"Statistical inference on change points in generalized semiparametric segmented models\" by Yang et al. (2025).","authors":"Vito M R Muggeo","doi":"10.1093/biomtc/ujaf147","DOIUrl":"10.1093/biomtc/ujaf147","url":null,"abstract":"<p><p>We provide some comments about the recent paper by Yang et al. related to model estimation and hypothesis testing in segmented regression.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145666830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Bayesian additive regression trees for multiple nonlinear dependency networks. 多非线性依赖网络的联合贝叶斯加性回归树。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf158
Licai Huang, Christine B Peterson, Min Jin Ha

Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.

鉴定蛋白质-蛋白质相互作用网络可以揭示癌症的治疗靶点;然而,对于异质性癌症,如结直肠癌(CRC),对整个数据集的汇总分析可能会遗漏亚型特异性机制,而对每个亚组数据的单独分析可能会降低识别共享关系的能力。为了解决这一限制,我们提出了一个分层贝叶斯模型来推断依赖网络,该模型鼓励跨子组的共同选择边,同时允许特定于子类型的连接。为了考虑非线性依赖关系,我们依靠贝叶斯加性回归树(BART)来表征每个子组的关键机制。因为BART是一个允许非线性效应和相互作用的灵活模型,它比假设线性的经典模型更适合基因组数据。为了连接子组,我们将马尔可夫随机场置于分裂规则中使用特征的概率之上;这允许我们在确定共享依赖关系时借用跨子群体的力量。我们使用模拟数据和实际数据应用来说明该模型,用于估计跨CRC亚型的蛋白质-蛋白质相互作用网络。
{"title":"Joint Bayesian additive regression trees for multiple nonlinear dependency networks.","authors":"Licai Huang, Christine B Peterson, Min Jin Ha","doi":"10.1093/biomtc/ujaf158","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf158","url":null,"abstract":"<p><p>Identifying protein-protein interaction networks can reveal therapeutic targets in cancer; however, for heterogeneous cancers such as colorectal cancer (CRC), a pooled analysis of the entire dataset may miss subtype-specific mechanisms, whereas separate analyses of each subgroup's data may reduce the power to identify shared relations. To address this limitation, we propose a hierarchical Bayesian model for the inference of dependency networks that encourages the common selection of edges across subgroups while allowing subtype-specific connections. To allow for nonlinear dependence relations, we rely on Bayesian Additive Regression Trees (BART) to characterize the key mechanisms for each subgroup. Because BART is a flexible model that allows nonlinear effects and interactions, it is more suitable for genomic data than classical models that assume linearity. To connect the subgroups, we place a Markov random field prior on the probability of utilizing a feature in a splitting rule; this allows us to borrow strength across subgroups in identifying shared dependence relations. We illustrate the model using both simulated data and a real data application on the estimation of protein-protein interaction networks across CRC subtypes.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145740907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable importance measures for heterogeneous treatment effects. 异质性处理效果的不同重要性度量。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf140
Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt

Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.

受精准医疗和治疗效果异质性应用的影响,最近的研究集中在使用机器学习(ML)估计条件平均治疗效果(CATEs)上。CATE估计可能代表了复杂的功能,对异质性的关键驱动因素提供了很少的见解。因此,我们引入了基于均方误差(MSE)的非参数治疗效果变量重要性度量(TE-VIMs)来预测个体治疗效果。更准确地说,TE-VIMs表示从CATE条件集中删除变量时MSE的增加。我们得到了有效的TE-VIM估计器,它可以与任何CATE估计策略一起使用,并且适用于ML估计。我们提出了几种策略来计算这些VIMs(例如,省略一个,或保留一个),使用流行的元学习器进行CATE。我们通过模拟研究研究了有限样本的性能,并通过临床试验数据说明了它们的应用。
{"title":"Variable importance measures for heterogeneous treatment effects.","authors":"Oliver J Hines, Karla Diaz-Ordaz, Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf140","DOIUrl":"10.1093/biomtc/ujaf140","url":null,"abstract":"<p><p>Motivated by applications in precision medicine and treatment effect heterogeneity, recent research has focused on estimating conditional average treatment effects (CATEs) using machine learning (ML). CATE estimates may represent complicated functions that provide little insight into the key drivers of heterogeneity. Therefore, we introduce nonparametric treatment effect variable importance measures (TE-VIMs), based on the mean-squared error (MSE) in predicting the individual treatment effect. More precisely, TE-VIMs represent the increase in MSE when variables are removed from the CATE conditioning set. We derive efficient TE-VIM estimators which can be used with any CATE estimation strategy and are amenable to ML estimation. We propose several strategies to calculate these VIMs (eg, leave-one out, or keep-one in), using popular meta-learners for the CATE. We study the finite sample performance through a simulation study and illustrate their application using clinical trial data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7618827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inverse-intensity weighted generalized estimating equations for longitudinal data subject to irregular observation: which variables should be included in the visit rate model? 不规则观测下纵向数据的逆强度加权广义估计方程:访问率模型中应包括哪些变量?
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf128
Eleanor M Pullenayegum, Di Shan

Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.

纵向数据往往受到不规则和信息访问时间的影响。在给定访问模型的协变量时,如果结果和访问时间是条件独立的,则通过访问率的逆加权广义估计方程可以得到回归系数的渐近无偏估计。在保持条件独立性的情况下,加入其他协变量对估计回归系数的渐近偏差没有影响,但对其方差的影响是未知的。结果表明,添加与结果和访问过程无关的变量时,方差不变;添加与结果和访问过程无关的变量时,方差减小。根据协变量和结果的相关结构,添加与就诊相关但不与结果相关的变量可能会增加或减少估计结果模型回归系数的方差。应用于重度抑郁症的研究发现,当结果预测因子而不是访问时,估计回归系数的方差与访问率模型相似,但在某些情况下,在添加访问预测因子而不是结果时,估计回归系数的方差始终较大,在某些情况下增加了2倍。我们建议访问过程模型包括与结果相关的变量,但对那些与结果无关的变量要谨慎对待。
{"title":"Inverse-intensity weighted generalized estimating equations for longitudinal data subject to irregular observation: which variables should be included in the visit rate model?","authors":"Eleanor M Pullenayegum, Di Shan","doi":"10.1093/biomtc/ujaf128","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf128","url":null,"abstract":"<p><p>Longitudinal data are often subject to irregular and informative visit times. Weighting generalized estimating equations by the inverse of the visit rate yields asymptotically unbiased estimates of regression coefficients provided that outcomes and visit times are conditionally independent, given the covariates in the visit model. Adding other covariates has no impact on the asymptotic bias of estimated regression coefficients, provided that conditional independence is maintained, but the impact on their variances is unknown. We show that variances are unchanged on adding variables associated with neither outcome nor visit process, and decrease on adding variables associated with outcome but not visit process. Adding variables associated with visits but not outcome may either increase or decrease variances of estimated outcome model regression coefficients, depending on the correlation structure of the covariates and the outcome. Application to a study of major depressive disorder found that the variances of estimated regression coefficients were of a similar magnitude when predictors of outcome but not visits were added to the visit rate model but consistently larger, in some cases by a factor of 2, on adding predictors of visits but not outcome. We recommend that visit process models include variables associated with outcome, but that those unassociated with the outcome be treated with caution.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive stratified sampling design in two-phase studies for average causal effect estimation. 平均因果效应估计的两阶段研究中的自适应分层抽样设计。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf143
Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu

Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.

利用观测数据进行因果推断通常会受到许多混杂效应的影响,如果忽略这些混杂效应,平均因果效应(ACE)估计会受到极大的扭曲。一些混杂因素的信息,如遗传生物标志物和医学成像,在实践中难以获得。两阶段研究是解决这一问题的有效方法。在这些研究中,在第一阶段对大量受试者的结果、治疗和廉价混杂因素进行了测量;然后在第二阶段为有限数量的受试者收集昂贵的混杂测量。有效的统计设计对于控制第二阶段产生的费用至关重要。在本文中,我们提出了一种自适应分层抽样设计(AdaStrat),它使ACE估计器在给定的第二阶段样本量下的方差最小化。AdaStrat首先为随机选择的试验数据收集昂贵的混杂测量,这些数据用于制定分层策略并确定地层的采样概率。由此产生的分层和抽样策略应用于所有第一阶段受试者,以确定第二阶段受试者与昂贵的混杂措施。我们严格证明,与现有的带地层前缀的采样设计相比,AdaStrat产生了更有效的ACE估计器。通过模拟研究评估了AdaStrat的有限样本特性,证明了它相对于固定分层抽样设计(FixStrat)的优势,在我们的模拟情况下,相对效率在20%到30%之间。通过英国生物银行数据的应用,进一步确认了AdaStrat所需的有限样本特性。
{"title":"Adaptive stratified sampling design in two-phase studies for average causal effect estimation.","authors":"Min Zeng, Qiyu Wang, Zijian Sui, Hong Zhang, Jinfeng Xu","doi":"10.1093/biomtc/ujaf143","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf143","url":null,"abstract":"<p><p>Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated double machine learning for high-dimensional semiparametric models. 高维半参数模型的联合双机器学习。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf150
Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu

Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.

联邦学习可以在保持数据本地化的同时训练全局模型;然而,目前的方法面临着涉及复杂干扰参数的高维半参数模型的挑战。本文提出了一种联邦双机器学习框架,旨在解决多中心研究中半参数模型的高维干扰参数。我们的方法利用双机器学习(Chernozhukov等人,2018a)来估计中心特定的参数,在内曼正交框架内扩展代理有效评分方法,并应用密度比倾斜来创建一个联合估计器,该估计器将本地个人层面的数据与来自其他中心的汇总统计数据相结合。该方法减轻了高维干扰参数估计中的正则化偏差和过拟合。我们在最小假设下建立了估计器的极限分布,通过广泛的模拟验证了它的性能,并证明了它在分析来自阿尔茨海默病神经成像倡议研究的多相数据中的有效性。
{"title":"Federated double machine learning for high-dimensional semiparametric models.","authors":"Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu","doi":"10.1093/biomtc/ujaf150","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf150","url":null,"abstract":"<p><p>Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses. 弥合设计和分析之间的差距:随机化推理和对治疗剂量匹配的观察性研究的敏感性分析。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf156
Jeffrey Zhang, Siyu Heng

Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.

匹配是观察性研究中常用的因果推理研究设计。通过对不同处理组间测量混杂因素的匹配,可以在无不可测混杂假设下进行有效的随机化推断,并进一步进行敏感性分析,评估结果对潜在不可测混杂因素的稳健性。然而,对于许多常见的匹配设计,仍然缺乏有效的下游随机化推理和灵敏度分析方法。具体而言,在治疗剂量匹配的观察性研究中(如连续或顺序治疗),除配对等特殊情况外,没有现有的随机化推理或灵敏度分析方法来研究样本平均治疗效果的类似物(即neyman型弱零值),也没有现有的有效的灵敏度分析方法来检验任何受试者无治疗效果的锐零值(即:当结果是非二元的时候。为了填补这些重要的空白,我们提出了新的随机化推理和敏感性分析方法,这些方法可以适用于治疗剂量的一般匹配设计,适用于一般类型的结果变量(例如,二进制,有序或连续),并涵盖Fisher尖锐零值和neyman型弱零值。我们通过全面的仿真研究和实际数据应用来说明我们的方法。所有建议的方法都已纳入$tt {R}$ package $tt {doseSens}$。
{"title":"Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses.","authors":"Jeffrey Zhang, Siyu Heng","doi":"10.1093/biomtc/ujaf156","DOIUrl":"10.1093/biomtc/ujaf156","url":null,"abstract":"<p><p>Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating heterogeneous treatment effects for general responses. 估计一般反应的异质性治疗效果。
IF 1.7 4区 数学 Q3 BIOLOGY Pub Date : 2025-10-08 DOI: 10.1093/biomtc/ujaf162
Zijun Gao, Trevor Hastie

Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.

异质性治疗效果模型使我们能够在亚组水平上比较治疗,并且在个性化医疗、广告和教育等应用中越来越受欢迎。不管反应的类型是什么(连续的、二元的、计数的、存活的),大多数因果估计都集中在治疗和控制条件手段之间的差异上。在本文中,我们提出了一个替代估计,dina -自然参数的差异-来量化由指数族和Cox模型驱动的异质性治疗效果。尽管反应类型不同,但DINA对于协变量对治疗效果的影响建模既方便又实用。此外,我们为DINA引入了一种元算法,使从业者能够利用强大的现成机器学习工具来估计干扰函数。该元算法对干扰函数估计中的误差也具有统计鲁棒性。我们在模拟和真实数据集上展示了我们的方法与各种机器学习基础学习器相结合的有效性。
{"title":"Estimating heterogeneous treatment effects for general responses.","authors":"Zijun Gao, Trevor Hastie","doi":"10.1093/biomtc/ujaf162","DOIUrl":"10.1093/biomtc/ujaf162","url":null,"abstract":"<p><p>Heterogeneous treatment effect models allow us to compare treatments at subgroup levels and are becoming increasingly popular in applications such as personalized medicine, advertising, and education. Regardless of the type of responses (continuous, binary, count, survival), most causal estimands focus on the differences between the treatment and control conditional means. In this paper, we propose an alternative estimand, DINA-the DIfference in NAtural parameters-to quantify heterogeneous treatment effects motivated by exponential families and the Cox model. Despite the type of responses, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Additionally, we introduce a meta-algorithm for DINA, enabling practitioners to utilize powerful off-the-shelf machine learning tools for the estimation of nuisance functions. This meta-algorithm is also statistically robust to errors in the nuisance function estimation. We demonstrate the efficacy of our method in combination with various machine learning base-learners on both simulated and real datasets.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12728347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1