首页 > 最新文献

Biometrika最新文献

英文 中文
Post-selection inference for causal effects after causal discovery. 因果发现后因果效应的后选择推理。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-10-29 DOI: 10.1093/biomet/asaf073
Ting-Hsuan Chang, Zijian Guo, Daniel Malinsky

Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families.

基于约束的因果发现算法通过执行一系列条件独立性测试,在可能的候选空间(例如,所有有向无环图)中选择图形因果模型。当不确定哪些协变量应该被调整,或者哪些变量作为混杂因素与中介因素时,这些可用于告知因果效应的估计(例如,平均治疗效果)。然而,天真地使用数据两次,用于模型选择和估计,将导致无效的置信区间。此外,如果选择的图是不正确的,推论主张可能适用于与实际因果效应不同的选择函数。我们提出了一种基于重新抽样和筛选程序的后选择推理方法,该方法本质上是使用随机变化的中间检验统计量多次执行因果发现。然后,从单个基于图的估计和区间的联合中构造目标因果效应的估计和相应的置信集。我们证明了这种构造对真因果效应参数具有渐近正确的覆盖。重要的是,这种保证适用于固定的人口水平效应,而不是依赖于数据或依赖于选择的数量。我们的大部分论述都集中在学习有向无环图的pc算法和简单的多元高斯情况,但该方法是通用的和模块化的,因此它可以与其他基于条件独立的发现算法和分布族一起使用。
{"title":"Post-selection inference for causal effects after causal discovery.","authors":"Ting-Hsuan Chang, Zijian Guo, Daniel Malinsky","doi":"10.1093/biomet/asaf073","DOIUrl":"10.1093/biomet/asaf073","url":null,"abstract":"<p><p>Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.8,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the asymptotic validity of confidence sets for linear functionals of solutions to integral equations. 积分方程解的线性泛函置信集的渐近有效性。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-10-06 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf067
E Smucler, J M Robins, A Rotnitzky

This paper examines the construction of confidence sets for parameters defined as linear functionals of a function of [Formula: see text] and [Formula: see text] whose conditional mean given [Formula: see text] and [Formula: see text] equals the conditional mean of another variable [Formula: see text] given [Formula: see text] and [Formula: see text]. Many estimands of interest in causal inference can be expressed in this form, including the average treatment effect in proximal causal inference and treatment effect contrasts in instrumental variable models. We derive a necessary condition for a confidence set to be uniformly valid over a model that allows for the dependence between [Formula: see text] and [Formula: see text] given [Formula: see text] to be arbitrarily weak. We show that, for any such confidence set, there must exist some laws in the model under which, with high probability, the confidence set has a diameter greater than or equal to the diameter of the parameter's range. In particular, consistent with the weak instrument literature, Wald confidence intervals are not uniformly valid over the aforementioned model when the parameter's range is infinite. Furthermore, we argue that inverting the score test, a successful approach in that literature, generally fails for the broader class of parameters considered here. We present a method for constructing uniformly valid confidence sets when all variables, but possibly [Formula: see text], are binary, discuss its limitations and emphasize that developing valid confidence sets for the class of parameters considered here remains an open problem.

本文研究了定义为[公式:见文]和[公式:见文]的函数的线性泛函的参数的置信集的构造,这些函数给定[公式:见文]和[公式:见文]的条件均值等于给定[公式:见文]和[公式:见文]的另一个变量[公式:见文]的条件均值。许多对因果推理感兴趣的估计都可以用这种形式表示,包括近端因果推理的平均处理效果和工具变量模型中的处理效果对比。我们导出了一个必要条件,使置信集在一个模型上一致有效,该模型允许给定[公式:见文本]的[公式:见文本]与[公式:见文本]之间的依赖性任意弱。我们证明,对于任何这样的置信集,在模型中一定存在一些定律,在这些定律下,置信集的直径有很大概率大于或等于参数范围的直径。特别是,与弱仪器文献一致,当参数范围无限时,Wald置信区间在上述模型上并不一致有效。此外,我们认为,倒分数测试,一个成功的方法,在该文献中,通常失败的更广泛的参数类在这里考虑。我们提出了一种方法来构造一致有效的置信集当所有变量,但可能[公式:见文本],是二元的,讨论了它的局限性,并强调开发有效的置信集为这类参数仍然是一个开放的问题。
{"title":"On the asymptotic validity of confidence sets for linear functionals of solutions to integral equations.","authors":"E Smucler, J M Robins, A Rotnitzky","doi":"10.1093/biomet/asaf067","DOIUrl":"10.1093/biomet/asaf067","url":null,"abstract":"<p><p>This paper examines the construction of confidence sets for parameters defined as linear functionals of a function of [Formula: see text] and [Formula: see text] whose conditional mean given [Formula: see text] and [Formula: see text] equals the conditional mean of another variable [Formula: see text] given [Formula: see text] and [Formula: see text]. Many estimands of interest in causal inference can be expressed in this form, including the average treatment effect in proximal causal inference and treatment effect contrasts in instrumental variable models. We derive a necessary condition for a confidence set to be uniformly valid over a model that allows for the dependence between [Formula: see text] and [Formula: see text] given [Formula: see text] to be arbitrarily weak. We show that, for any such confidence set, there must exist some laws in the model under which, with high probability, the confidence set has a diameter greater than or equal to the diameter of the parameter's range. In particular, consistent with the weak instrument literature, Wald confidence intervals are not uniformly valid over the aforementioned model when the parameter's range is infinite. Furthermore, we argue that inverting the score test, a successful approach in that literature, generally fails for the broader class of parameters considered here. We present a method for constructing uniformly valid confidence sets when all variables, but possibly [Formula: see text], are binary, discuss its limitations and emphasize that developing valid confidence sets for the class of parameters considered here remains an open problem.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 4","pages":"asaf067"},"PeriodicalIF":2.8,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12614171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145538917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resampling methods with multiply imputed data. 多重输入数据的重采样方法。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf059
Michael W Robbins, Lane Burgette

Resampling techniques have become increasingly popular for estimation of uncertainty. However, data are often fraught with missing values that are commonly imputed to facilitate analysis. This article addresses the issue of using resampling methods such as a jackknife or bootstrap in conjunction with imputations that have been sampled stochastically, in the vein of multiple imputation. We derive the theory needed to illustrate two key points regarding the use of resampling methods in lieu of traditional combining rules. First, imputations should be independently generated multiple times within each replicate group of a jackknife or bootstrap. Second, the number of multiply imputed datasets per replicate group must dramatically exceed the number of replicate groups for a jackknife; however, this is not the case in a bootstrap approach. We also discuss bias-adjusted analogues of the jackknife and bootstrap that are argued to require fewer imputed datasets. A simulation study is provided to support these theoretical conclusions.

重采样技术在估计不确定度方面越来越受欢迎。然而,数据往往充满了缺失的值,这些值通常是为了便于分析而输入的。本文解决了使用重采样方法的问题,如叠刀或bootstrap与随机采样的imputation相结合,在多次imputation的静脉。我们推导了说明关于使用重采样方法代替传统组合规则的两个关键点所需的理论。首先,应在一个刀切或引导的每个复制组内独立地多次生成imputation。其次,每个复制组的多重输入数据集的数量必须大大超过一个小刀的复制组的数量;然而,在引导方法中并非如此。我们还讨论了偏差调整的类似的小刀和bootstrap,认为需要更少的输入数据集。仿真研究结果支持了这些理论结论。
{"title":"Resampling methods with multiply imputed data.","authors":"Michael W Robbins, Lane Burgette","doi":"10.1093/biomet/asaf059","DOIUrl":"10.1093/biomet/asaf059","url":null,"abstract":"<p><p>Resampling techniques have become increasingly popular for estimation of uncertainty. However, data are often fraught with missing values that are commonly imputed to facilitate analysis. This article addresses the issue of using resampling methods such as a jackknife or bootstrap in conjunction with imputations that have been sampled stochastically, in the vein of multiple imputation. We derive the theory needed to illustrate two key points regarding the use of resampling methods in lieu of traditional combining rules. First, imputations should be independently generated multiple times within each replicate group of a jackknife or bootstrap. Second, the number of multiply imputed datasets per replicate group must dramatically exceed the number of replicate groups for a jackknife; however, this is not the case in a bootstrap approach. We also discuss bias-adjusted analogues of the jackknife and bootstrap that are argued to require fewer imputed datasets. A simulation study is provided to support these theoretical conclusions.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 4","pages":"asaf059"},"PeriodicalIF":2.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12614170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145538956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A More Robust Approach to Multivariable Mendelian Randomization. 一种更稳健的多变量孟德尔随机化方法。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-07-21 DOI: 10.1093/biomet/asaf053
Yinxiang Wu, Hyunseung Kang, Ting Ye

Multivariable Mendelian randomization (MVMR) uses genetic variants as instrumental variables to infer the direct effects of multiple exposures on an outcome. However, unlike univariable Mendelian randomization, MVMR often faces greater challenges with many weak instruments, which can lead to bias not necessarily toward zero and inflation of type I errors. In this work, we introduce a new asymptotic regime that allows exposures to have varying degrees of instrument strength, providing a more accurate theoretical framework for studying MVMR estimators. Under this regime, our analysis of the widely used multivariable inverse-variance weighted method shows that it is often biased and tends to produce misleadingly narrow confidence intervals in the presence of many weak instruments. To address this, we propose a simple, closed-form modification to the multivariable inverse-variance weighted estimator to reduce bias from weak instruments, and additionally introduce a novel spectral regularization technique to improve finite-sample performance. We show that the resulting spectral-regularized estimator remains consistent and asymptotically normal under many weak instruments. Through simulations and real data applications, we demonstrate that our proposed estimator and asymptotic framework can enhance the robustness of MVMR analyses.

多变量孟德尔随机化(MVMR)使用遗传变异作为工具变量来推断多次暴露对结果的直接影响。然而,与单变量孟德尔随机化不同,MVMR在使用许多弱工具时往往面临更大的挑战,这可能导致不一定偏向于零的偏差和I型误差的膨胀。在这项工作中,我们引入了一种新的渐近机制,允许暴露具有不同程度的仪器强度,为研究MVMR估计器提供了更准确的理论框架。在这种情况下,我们对广泛使用的多变量反方差加权方法的分析表明,它经常有偏差,并且在存在许多弱工具的情况下往往会产生误导性的窄置信区间。为了解决这个问题,我们对多变量反方差加权估计器提出了一个简单的、封闭的修改,以减少来自弱仪器的偏差,并引入了一种新的频谱正则化技术来提高有限样本性能。我们证明了得到的谱正则化估计量在许多弱仪器下保持一致和渐近正态。通过仿真和实际数据应用,我们证明了我们提出的估计器和渐近框架可以增强MVMR分析的鲁棒性。
{"title":"A More Robust Approach to Multivariable Mendelian Randomization.","authors":"Yinxiang Wu, Hyunseung Kang, Ting Ye","doi":"10.1093/biomet/asaf053","DOIUrl":"10.1093/biomet/asaf053","url":null,"abstract":"<p><p>Multivariable Mendelian randomization (MVMR) uses genetic variants as instrumental variables to infer the direct effects of multiple exposures on an outcome. However, unlike univariable Mendelian randomization, MVMR often faces greater challenges with many weak instruments, which can lead to bias not necessarily toward zero and inflation of type I errors. In this work, we introduce a new asymptotic regime that allows exposures to have varying degrees of instrument strength, providing a more accurate theoretical framework for studying MVMR estimators. Under this regime, our analysis of the widely used multivariable inverse-variance weighted method shows that it is often biased and tends to produce misleadingly narrow confidence intervals in the presence of many weak instruments. To address this, we propose a simple, closed-form modification to the multivariable inverse-variance weighted estimator to reduce bias from weak instruments, and additionally introduce a novel spectral regularization technique to improve finite-sample performance. We show that the resulting spectral-regularized estimator remains consistent and asymptotically normal under many weak instruments. Through simulations and real data applications, we demonstrate that our proposed estimator and asymptotic framework can enhance the robustness of MVMR analyses.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12335017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144815700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aggregating dependent signals with heavy-tailed combination tests. 用重尾组合试验聚合相关信号。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf038
Lin Gui, Yuchao Jiang, Jingshu Wang

Combining dependent [Formula: see text]-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, [Formula: see text]-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean [Formula: see text]-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of [Formula: see text]-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the [Formula: see text]-values. First, when [Formula: see text]-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided [Formula: see text]-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate [Formula: see text]-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where [Formula: see text]-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.

结合依赖的[公式:见文本]值对统计推断提出了长期的挑战,特别是在汇总来自多种方法的发现以增强信号检测时。近年来,基于正变尾分布的[公式:见文]-值组合检验,如柯西组合检验和调和均值[公式:见文]-值对未知依赖的鲁棒性引起了人们的关注。本文在[公式:见文本]值的数量固定且全局检验显著性水平趋近于零的渐近状态下对这些方法进行了理论和经验评价。我们考察了[公式:见文本]-值之间的两种依赖关系。首先,当[公式:见文本]-值是两两渐近独立的,例如没有完全相关的二元正态检验统计量,我们证明这些组合检验是渐近有效的。然而,它们变得等同于Bonferroni检验,因为单侧和双侧的显著性水平都趋于零[公式:见文本]-值。实证调查表明,这种等效性可以出现在适度小的显著水平。其次,在两两拟渐近依赖下,例如使用二元[公式:见文本]分布检验统计,我们的模拟表明,即使显著性水平降低,这些组合检验仍然有效,并且比Bonferroni检验表现出显著的功率增益。这些发现突出了这些组合测试在[公式:见文本]值表现出实质性依赖的情况下的潜在优势。我们的模拟还检查了测试性能如何依赖于底层分布的支持和尾重。
{"title":"Aggregating dependent signals with heavy-tailed combination tests.","authors":"Lin Gui, Yuchao Jiang, Jingshu Wang","doi":"10.1093/biomet/asaf038","DOIUrl":"10.1093/biomet/asaf038","url":null,"abstract":"<p><p>Combining dependent [Formula: see text]-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, [Formula: see text]-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean [Formula: see text]-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of [Formula: see text]-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the [Formula: see text]-values. First, when [Formula: see text]-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided [Formula: see text]-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate [Formula: see text]-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where [Formula: see text]-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 4","pages":"asaf038"},"PeriodicalIF":2.8,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145407749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integer programming for learning directed acyclic graphs from nonidentifiable Gaussian models. 从不可识别高斯模型中学习有向无环图的整数规划。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-04-28 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf032
Tong Xu, Armeen Taeb, Simge Küçükyavuz, Ali Shojaie

We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: (i) they cannot provide optimality guarantees and can suffer from learning suboptimal models; (ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package micodag.

我们研究了从连续观测数据中学习有向无环图的问题,这些数据是根据线性高斯结构方程模型生成的。最先进的结构学习方法在这种情况下至少有以下一个缺点:(i)它们不能提供最优性保证,并且可能会学习次优模型;(ii)它们依赖于噪声是均方差的严格假设,因此底层模型是完全可识别的。我们克服了这些缺点,并开发了一个计算效率高的混合整数规划框架,用于学习考虑任意异方差噪声的中型问题。我们给出了一个早期停止准则,在该准则下我们可以终止分支定界过程以得到一个渐近最优解,并建立了该近似解的一致性。此外,我们通过数值实验表明,我们的方法优于最先进的算法,并且对噪声异方差具有鲁棒性,而一些竞争方法的性能在严重违反可识别性假设时恶化。我们的方法的软件实现可以作为Python包microdag获得。
{"title":"Integer programming for learning directed acyclic graphs from nonidentifiable Gaussian models.","authors":"Tong Xu, Armeen Taeb, Simge Küçükyavuz, Ali Shojaie","doi":"10.1093/biomet/asaf032","DOIUrl":"10.1093/biomet/asaf032","url":null,"abstract":"<p><p>We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: (i) they cannot provide optimality guarantees and can suffer from learning suboptimal models; (ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package micodag.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 3","pages":"asaf032"},"PeriodicalIF":2.8,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12368277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general form of covariate adjustment in clinical trials under covariate-adaptive randomization. 协变量自适应随机化下临床试验中协变量调整的一般形式。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-04-12 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf029
Marlena S Bannick, Jun Shao, Jingyi Liu, Yu Du, Yanyao Yi, Ting Ye

In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted estimator, which is a general form of covariate adjustment that uses linear, generalized linear and nonparametric or machine learning models for the conditional mean of the response given covariates. Under covariate-adaptive randomization, we establish general theorems that show a complete picture of the asymptotic normality, efficiency gain and applicability of augmented inverse propensity weighted estimators. In particular, we provide for the first time a rigorous theoretical justification of using machine learning methods with cross-fitting for dependent data under covariate-adaptive randomization. Based on the general theorems, we offer insights on the conditions for guaranteed efficiency gain and universal applicability under different randomization schemes, which also motivate a joint calibration strategy using some constructed covariates after applying augmented inverse propensity weighted estimators.

在随机临床试验中,调整基线协变量可以提高证明和量化治疗效果的可信度和效率。本文研究了增广逆倾向加权估计量,它是协变量调整的一种一般形式,它使用线性、广义线性和非参数或机器学习模型对给定协变量响应的条件均值进行调整。在协变量自适应随机化条件下,我们建立了一些普遍定理,这些定理完整地展示了增广逆倾向加权估计的渐近正态性、效率增益和适用性。特别是,我们首次为在协变量自适应随机化下使用具有交叉拟合的依赖数据的机器学习方法提供了严格的理论依据。在一般定理的基础上,我们给出了在不同随机化方案下保证效率增益和普遍适用性的条件,并激发了在使用增广逆倾向加权估计量后使用一些构造协变量的联合校准策略。
{"title":"A general form of covariate adjustment in clinical trials under covariate-adaptive randomization.","authors":"Marlena S Bannick, Jun Shao, Jingyi Liu, Yu Du, Yanyao Yi, Ting Ye","doi":"10.1093/biomet/asaf029","DOIUrl":"10.1093/biomet/asaf029","url":null,"abstract":"<p><p>In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted estimator, which is a general form of covariate adjustment that uses linear, generalized linear and nonparametric or machine learning models for the conditional mean of the response given covariates. Under covariate-adaptive randomization, we establish general theorems that show a complete picture of the asymptotic normality, efficiency gain and applicability of augmented inverse propensity weighted estimators. In particular, we provide for the first time a rigorous theoretical justification of using machine learning methods with cross-fitting for dependent data under covariate-adaptive randomization. Based on the general theorems, we offer insights on the conditions for guaranteed efficiency gain and universal applicability under different randomization schemes, which also motivate a joint calibration strategy using some constructed covariates after applying augmented inverse propensity weighted estimators.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 3","pages":"asaf029"},"PeriodicalIF":2.8,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144658328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference for generalized linear models via quasi-posteriors. 广义线性模型的准后验贝叶斯推理。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2025-03-27 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf022
D Agnoletto, T Rigon, D B Dunson

Generalized linear models are routinely used for modelling relationships between a response variable and a set of covariates. The simple form of a generalized linear model comes with easy interpretability, but also leads to concerns about model misspecification impacting inferential conclusions. A popular semiparametric solution adopted in the frequentist literature is quasilikelihood, which improves robustness by only requiring correct specification of the first two moments. We develop a robust approach to Bayesian inference in generalized linear models through quasi-posterior distributions. We show that quasi-posteriors provide a coherent generalized Bayes inference method, while also approximating so-called coarsened posteriors. In so doing, we obtain new insights into the choice of coarsening parameter. Asymptotically, the quasi-posterior converges in total variation to a normal distribution and has important connections with the loss-likelihood bootstrap posterior. We demonstrate that it is also well calibrated in terms of frequentist coverage. Moreover, the loss-scale parameter has a clear interpretation as a dispersion, and this leads to a consolidated method-of-moments estimator.

广义线性模型通常用于模拟响应变量和一组协变量之间的关系。广义线性模型的简单形式易于解释,但也会导致对模型错误说明影响推断结论的担忧。在频率学文献中采用的一种流行的半参数解是拟似然,它通过只要求前两个矩的正确规范来提高鲁棒性。我们通过拟后验分布发展了广义线性模型中贝叶斯推理的鲁棒方法。我们证明了准后验提供了一种连贯的广义贝叶斯推理方法,同时也近似于所谓的粗后验。在此基础上,对粗化参数的选择有了新的认识。渐近地,准后验在总方差上收敛于正态分布,并与损失似然自举后验有重要联系。我们证明它在频率覆盖方面也得到了很好的校准。此外,损失尺度参数有一个清晰的解释为色散,这导致了一个统一的矩量估计方法。
{"title":"Bayesian inference for generalized linear models via quasi-posteriors.","authors":"D Agnoletto, T Rigon, D B Dunson","doi":"10.1093/biomet/asaf022","DOIUrl":"10.1093/biomet/asaf022","url":null,"abstract":"<p><p>Generalized linear models are routinely used for modelling relationships between a response variable and a set of covariates. The simple form of a generalized linear model comes with easy interpretability, but also leads to concerns about model misspecification impacting inferential conclusions. A popular semiparametric solution adopted in the frequentist literature is quasilikelihood, which improves robustness by only requiring correct specification of the first two moments. We develop a robust approach to Bayesian inference in generalized linear models through quasi-posterior distributions. We show that quasi-posteriors provide a coherent generalized Bayes inference method, while also approximating so-called coarsened posteriors. In so doing, we obtain new insights into the choice of coarsening parameter. Asymptotically, the quasi-posterior converges in total variation to a normal distribution and has important connections with the loss-likelihood bootstrap posterior. We demonstrate that it is also well calibrated in terms of frequentist coverage. Moreover, the loss-scale parameter has a clear interpretation as a dispersion, and this leads to a consolidated method-of-moments estimator.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asaf022"},"PeriodicalIF":2.4,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample splitting and assessing goodness-of-fit of time series. 样本分割与时间序列拟合优度评估。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-03-05 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asaf017
Richard A Davis, Leon Fernandes

A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.

时间序列建模的一个基本步骤,往往是最后一步,是评估所提出的模型与数据的拟合质量。由于生成模型的创新的潜在分布通常没有规定,所以拟合优度检验通常采用检验序列独立性的拟合残差的形式。然而,这些拟合残差本质上是相关的,因为它们是基于相同的参数估计,因此需要调整序列独立性的标准检验,例如基于拟合残差的自相关函数或自距离相关函数的检验。Pfister等人(2018)的样本分割过程是针对独立数据模型的一种修复方法,但在依赖设置中不起作用。在本文中,在时间序列设置中利用样本分裂,使用自相关函数和自距离相关函数对拟合残差的序列依赖性进行检验。使用数据点的第一个[公式:见文]来估计模型的参数,然后使用这些参数估计,使用数据点的最后一个[公式:见文]来计算估计的残差。然后根据这些[公式:见文本]残差对序列独立性进行检验。只要[公式:见文]和[公式:见文]数据分割之间的重叠是渐近的[公式:见文],序列独立性的自相关函数和自距离相关函数检验的极限分布往往与底层残差确实独立且同分布时的极限分布相同。特别是,如果使用数据的前半部分来估计参数,并根据这些参数估计计算整个数据集的估计残差,那么自相关函数和自距离相关函数可以具有相同的极限分布,就好像残差是独立且同分布的一样。该方法减少了在拟合优度检验中自相关函数和自距离相关函数的置信限构造中需要调整的问题。
{"title":"Sample splitting and assessing goodness-of-fit of time series.","authors":"Richard A Davis, Leon Fernandes","doi":"10.1093/biomet/asaf017","DOIUrl":"10.1093/biomet/asaf017","url":null,"abstract":"<p><p>A fundamental and often final step in time series modelling is to assess the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are intrinsically dependent since they are based on the same parameter estimates, and thus standard tests of serial independence, such as those based on the autocorrelation function or auto-distance correlation function of the fitted residuals, need to be adjusted. The sample-splitting procedure of Pfister et al. (2018) is one such fix for the case of models for independent data, but fails to work in the dependent setting. In this article, sample splitting is leveraged in the time series setting to perform tests of serial dependence of fitted residuals using the autocorrelation function and auto-distance correlation function. The first [Formula: see text] of the data points are used to estimate the parameters of the model and then, using these parameter estimates, the last [Formula: see text] of the data points are used to compute the estimated residuals. Tests for serial independence are then based on these [Formula: see text] residuals. As long as the overlap between the [Formula: see text] and [Formula: see text] data splits is asymptotically [Formula: see text], the autocorrelation function and auto-distance correlation function tests of serial independence often have the same limit distributions as when the underlying residuals are indeed independent and identically distributed. In particular, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire dataset based on these parameter estimates, then the autocorrelation function and auto-distance correlation function can have the same limit distributions as if the residuals were independent and identically distributed. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the autocorrelation function and the auto-distance correlation function in goodness-of-fit testing.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asaf017"},"PeriodicalIF":2.8,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency of common spatial estimators under spatial confounding. 空间混杂下公共空间估计量的一致性。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-01-01 Epub Date: 2024-12-23 DOI: 10.1093/biomet/asae070
Brian Gilbert, Elizabeth L Ogburn, Abhirup Datta

This article addresses the asymptotic performance of popular spatial regression estimators of the linear effect of an exposure on an outcome under spatial confounding, the presence of an unmeasured spatially structured variable influencing both the exposure and the outcome. We first show that the estimators from ordinary least squares and restricted spatial regression are asymptotically biased under spatial confounding. We then prove a novel result on the infill consistency of the generalized least squares estimator using a working covariance matrix from a Matérn or squared exponential kernel, in the presence of spatial confounding. The result holds under very mild assumptions, accommodating any exposure with some nonspatial variation, any spatially continuous fixed confounder function, and non-Gaussian errors in both the exposure and the outcome. Finally, we prove that spatial estimators from generalized least squares, Gaussian process regression and spline models that are consistent under confounding by a fixed function will also be consistent under endogeneity or confounding by a random function, i.e., a stochastic process. We conclude that, contrary to some claims in the literature on spatial confounding, traditional spatial estimators are capable of estimating linear exposure effects under spatial confounding as long as there is some noise in the exposure. We support our theoretical arguments with simulation studies.

本文讨论了在空间混淆下,暴露对结果的线性效应的流行空间回归估计的渐近性能,即存在影响暴露和结果的不可测量的空间结构化变量。我们首先证明了普通最小二乘和受限空间回归的估计量在空间混杂下是渐近偏的。然后,在存在空间混杂的情况下,我们利用一个来自mat或平方指数核的工作协方差矩阵证明了广义最小二乘估计的填充一致性的一个新结果。结果在非常温和的假设下成立,包括任何具有一些非空间变化的暴露,任何空间连续的固定混杂函数,以及暴露和结果中的非高斯误差。最后,我们证明了广义最小二乘、高斯过程回归和样条模型的空间估计量在固定函数的混杂下是一致的,在随机函数(即随机过程)的内性或混杂下也是一致的。我们得出的结论是,与一些关于空间混杂的文献相反,只要暴露中存在一些噪声,传统的空间估计器就能够估计空间混杂下的线性暴露效应。我们用模拟研究来支持我们的理论论点。
{"title":"Consistency of common spatial estimators under spatial confounding.","authors":"Brian Gilbert, Elizabeth L Ogburn, Abhirup Datta","doi":"10.1093/biomet/asae070","DOIUrl":"10.1093/biomet/asae070","url":null,"abstract":"<p><p>This article addresses the asymptotic performance of popular spatial regression estimators of the linear effect of an exposure on an outcome under spatial confounding, the presence of an unmeasured spatially structured variable influencing both the exposure and the outcome. We first show that the estimators from ordinary least squares and restricted spatial regression are asymptotically biased under spatial confounding. We then prove a novel result on the infill consistency of the generalized least squares estimator using a working covariance matrix from a Matérn or squared exponential kernel, in the presence of spatial confounding. The result holds under very mild assumptions, accommodating any exposure with some nonspatial variation, any spatially continuous fixed confounder function, and non-Gaussian errors in both the exposure and the outcome. Finally, we prove that spatial estimators from generalized least squares, Gaussian process regression and spline models that are consistent under confounding by a fixed function will also be consistent under endogeneity or confounding by a random function, i.e., a stochastic process. We conclude that, contrary to some claims in the literature on spatial confounding, traditional spatial estimators are capable of estimating linear exposure effects under spatial confounding as long as there is some noise in the exposure. We support our theoretical arguments with simulation studies.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":""},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1