首页 > 最新文献

Biometrika最新文献

英文 中文
Functional hybrid factor regression model for handling heterogeneity in imaging studies. 处理影像学研究异质性的功能混合因子回归模型。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-12-01 DOI: 10.1093/biomet/asac007
C Huang, H Zhu

This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer's disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, design, protocols or other hidden factors, and it has posed major challenges in integrative analysis of imaging data collected from multicentres or multistudies. We propose both estimation and inference procedures for estimating unknown parameters and detecting unknown factors under our new model. The asymptotic properties of both estimation and inference procedures are systematically investigated. The finite-sample performance of our proposed procedures is assessed by using Monte Carlo simulations and a real data example on hippocampal surface data from the Alzheimer's disease study.

本文开发了一个功能混合因素回归建模框架,以处理许多大规模影像学研究的异质性,如阿尔茨海默病神经影像学倡议研究。尽管这些影像学研究取得了许多成功,但这种异质性可能是由研究环境、人群、设计、方案或其他隐藏因素的差异造成的,这给多中心或多研究收集的影像学数据的综合分析带来了重大挑战。我们提出了在新模型下估计未知参数和检测未知因素的估计和推理程序。系统地研究了估计过程和推理过程的渐近性质。我们提出的程序的有限样本性能通过使用蒙特卡罗模拟和阿尔茨海默病研究海马表面数据的真实数据示例进行评估。
{"title":"Functional hybrid factor regression model for handling heterogeneity in imaging studies.","authors":"C Huang, H Zhu","doi":"10.1093/biomet/asac007","DOIUrl":"10.1093/biomet/asac007","url":null,"abstract":"<p><p>This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer's disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, design, protocols or other hidden factors, and it has posed major challenges in integrative analysis of imaging data collected from multicentres or multistudies. We propose both estimation and inference procedures for estimating unknown parameters and detecting unknown factors under our new model. The asymptotic properties of both estimation and inference procedures are systematically investigated. The finite-sample performance of our proposed procedures is assessed by using Monte Carlo simulations and a real data example on hippocampal surface data from the Alzheimer's disease study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10749215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A proximal distance algorithm for likelihood-based sparse covariance estimation. 基于似然法的稀疏协方差估计的近距离算法。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-12-01 Epub Date: 2022-02-16 DOI: 10.1093/biomet/asac011
Jason Xu, Kenneth Lange

This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties, and enables optimization of the resulting nonconvex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive-definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods across several metrics, for a suite of simulated experiments. Its merits are illustrated on international migration data and a case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.

本文探讨了在无模式稀疏性假设下估计协方差矩阵的任务。与现有的基于阈值或收缩惩罚的方法不同,我们提出了一种基于似然法的方法,该方法对协方差估计到对称稀疏集的距离进行正则化处理。这种方法避免了更常见的规范惩罚所引起的不必要的收缩,并通过解决一系列平滑、无约束的子问题来优化由此产生的非凸目标。这些子问题是通过大化-最小化原理的近距离版本生成和求解的。由此产生的算法执行迅速,能从容应对参数数量超过案例数量的情况,产生正有限解,并具有理想的收敛特性。经验表明,在一系列模拟实验中,我们的方法在多个指标上都优于其他竞争方法。我们通过国际移民数据和流式细胞仪案例研究说明了这种方法的优点。我们的研究结果表明,细胞信号数据的边际依赖网络和条件依赖网络比以前得出的结论更为相似。
{"title":"A proximal distance algorithm for likelihood-based sparse covariance estimation.","authors":"Jason Xu, Kenneth Lange","doi":"10.1093/biomet/asac011","DOIUrl":"10.1093/biomet/asac011","url":null,"abstract":"<p><p>This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties, and enables optimization of the resulting nonconvex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive-definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods across several metrics, for a suite of simulated experiments. Its merits are illustrated on international migration data and a case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10716840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Graphical Gaussian Process Models for Highly Multivariate Spatial Data. 高多元空间数据的图形高斯过程模型。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-12-01 DOI: 10.1093/biomet/asab061
Debangan Dey, Abhirup Datta, Sudipto Banerjee

For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a "curse of dimensionality" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate "Graphical Gaussian Processes" using a general construction called "stitching" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Matérn family of functions, stitching yields a multivariate GP whose univariate components are Matérn GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling.

对于多元空间高斯过程(GP)模型,传统的交叉协方差函数规范没有利用关系变量间图来确保变量之间的过程级条件独立性。这是不可取的,特别是对于高度多元的设置,其中流行的交叉协方差函数(如多元mat n)遭受“维数诅咒”,因为参数和浮点运算的数量分别以二次和三次顺序在变量数量上按比例增加。我们提出了一类多变量“图形高斯过程”,使用称为“拼接”的一般构造,从图中制作交叉协方差函数,并确保变量之间的过程级条件独立性。对于mat2013.2013.10函数族,拼接产生一个多变量GP,其单变量分量为mat2013.2013.10 GP,并且符合图形模型指定的过程级条件独立性。对于高度多元的设置和可分解的图形模型,拼接提供了大量的计算增益和参数维数减少。我们通过模拟实例和空气污染建模的应用,演示了图形化mat rn GP对高度多元空间数据联合建模的效用。
{"title":"Graphical Gaussian Process Models for Highly Multivariate Spatial Data.","authors":"Debangan Dey,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.1093/biomet/asab061","DOIUrl":"https://doi.org/10.1093/biomet/asab061","url":null,"abstract":"<p><p>For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a \"curse of dimensionality\" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate \"Graphical Gaussian Processes\" using a general construction called \"stitching\" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Matérn family of functions, stitching yields a multivariate GP whose univariate components are Matérn GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838617/pdf/nihms-1786615.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9104899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Thresholded Graphical Lasso Adjusts for Latent Variables 阈值图形套索调整潜在变量
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-11-10 DOI: 10.1093/biomet/asac060
Minjie Wang, Genevera I. Allen
Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program to estimate a sparse graph plus low-rank term that adjusts for latent variables; but, this approach poses challenges from both a computational and statistical perspective. We propose an alternative and incredibly simple solution: apply a hard thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, we show that thresholding the graphical lasso is graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. We also extend results to thresholded neighbourhood selection and CLIME estimators as well. We show that our simple thresholded graph estimators enjoy stronger empirical results than existing approaches for the latent variable graphical model problem and conclude with a neuroscience case study to estimate functional neural connections.
存在潜在变量的高斯图模型的结构学习一直是一个具有挑战性的问题。Chandrasekaran等人(2012)提出了一种凸程序来估计稀疏图和低秩项,该项可以调整潜在变量;但是,这种方法从计算和统计的角度都提出了挑战。我们提出了另一种非常简单的解决方案:对现有的图选择方法应用硬阈值算子。概念上简单,计算上有吸引力,我们证明了阈值化的图形套是在潜在变量存在下的图形选择一致,在更简单的最小边缘强度条件下,以提高的统计率。我们还将结果扩展到阈值邻居选择和CLIME估计器。我们表明,我们的简单阈值图估计器在潜在变量图模型问题上比现有方法具有更强的经验结果,并以神经科学案例研究来估计功能性神经连接。
{"title":"Thresholded Graphical Lasso Adjusts for Latent Variables","authors":"Minjie Wang, Genevera I. Allen","doi":"10.1093/biomet/asac060","DOIUrl":"https://doi.org/10.1093/biomet/asac060","url":null,"abstract":"Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program to estimate a sparse graph plus low-rank term that adjusts for latent variables; but, this approach poses challenges from both a computational and statistical perspective. We propose an alternative and incredibly simple solution: apply a hard thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, we show that thresholding the graphical lasso is graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. We also extend results to thresholded neighbourhood selection and CLIME estimators as well. We show that our simple thresholded graph estimators enjoy stronger empirical results than existing approaches for the latent variable graphical model problem and conclude with a neuroscience case study to estimate functional neural connections.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48628621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Additive Models for Symmetric Positive-Definite Matrices and Lie Groups 对称正定矩阵与李群的加性模型
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-09-29 DOI: 10.1093/biomet/asac055
Z. Lin, H. Müller, B. U. Park
We propose and investigate an additive regression model for symmetric positive-definite matrix valued responses and multiple scalar predictors. The model exploits the abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established and numerical studies show that the proposed model enjoys good numerical performance and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.
我们提出并研究了对称正定矩阵值响应和多个标量预测因子的加性回归模型。该模型利用了从对称正定矩阵的log Cholesky和log Euclidean框架中继承的阿贝尔群结构,并自然扩展到一般阿贝尔李群。所提出的可加性模型被证明连接到切线空间上的可加模型。这种联系不仅需要一种有效的算法来估计分量函数,而且允许将所提出的加性模型推广到一般的黎曼流形。建立了估计分量函数的最优渐近收敛速度和正态性,数值研究表明,当有多个预测因子时,该模型具有良好的数值性能,不受维数诅咒的影响。通过对脑扩散张量成像数据的分析,证明了该模型的实用价值。
{"title":"Additive Models for Symmetric Positive-Definite Matrices and Lie Groups","authors":"Z. Lin, H. Müller, B. U. Park","doi":"10.1093/biomet/asac055","DOIUrl":"https://doi.org/10.1093/biomet/asac055","url":null,"abstract":"\u0000 We propose and investigate an additive regression model for symmetric positive-definite matrix valued responses and multiple scalar predictors. The model exploits the abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established and numerical studies show that the proposed model enjoys good numerical performance and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47707225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Propensity Scores in the Design of Observational Studies for Causal Effects 因果效应观察研究设计中的倾向性得分
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-09-28 DOI: 10.1093/biomet/asac054
P. Rosenbaum, D. Rubin
The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.
任何研究的设计,无论是实验性的还是观察性的,旨在估计治疗条件相对于对照条件的因果影响,都是指在检查结果变量之前的那些活动。正如我们1983年的文章(Rosenbaum&Rubin,1983)中所定义的,倾向得分是在观察到的协变量的情况下,分配给治疗与控制的单位水平条件概率;因此,与观察性研究中有时使用的其他变量汇总相比,倾向评分明确不涉及任何结果变量。因此,通过匹配或平衡倾向得分来平衡治疗组和对照组中协变量的分布是观察性研究设计的一个方面。在这篇受邀对我们1983年的文章发表的评论中,我们回顾了20世纪80年代初的情况,并回顾了倾向得分帮助解决的一些明显的悖论。我们证明了平衡高维协变量的大量低维摘要是可能的,尽管通常不可能为高维协变的所有成分密切匹配个体。从某种意义上说,只有一个关键的观察到的协变量,即倾向得分,还有一个重要的未观察到的协变量,即“主要未观察到协变量”。当治疗分配是强可忽略的,即不成立时,倾向得分和主要未观察协变量是相等的。控制观察到的协变量是从关联到因果关系的关键步骤的前奏,这一步骤解决了未测量协变量的潜在偏差。观察性研究的设计也为因果关系的步骤做了准备:通过选择比较来提高设计灵敏度,通过寻找发现偏见的机会,通过寻求受不同偏见影响的相互支持的证据,通过结合准实验装置,如多个对照组,以及通过纳入经济学家的工具。所有这些考虑都反映了敏感性分析的正式发展,在20世纪80年代之前,敏感性分析基本上是非正式的。
{"title":"Propensity Scores in the Design of Observational Studies for Causal Effects","authors":"P. Rosenbaum, D. Rubin","doi":"10.1093/biomet/asac054","DOIUrl":"https://doi.org/10.1093/biomet/asac054","url":null,"abstract":"\u0000 The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47408529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Correction to: ‘Valid sequential inference on probability forecast performance’ 更正:“对概率预测性能的有效顺序推断”
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-09-13 DOI: 10.1093/biomet/asac043
A. Henzi, Johanna F. Ziegel
{"title":"Correction to: ‘Valid sequential inference on probability forecast performance’","authors":"A. Henzi, Johanna F. Ziegel","doi":"10.1093/biomet/asac043","DOIUrl":"https://doi.org/10.1093/biomet/asac043","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45482922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-scale Fisher's independence test for multivariate dependence. 针对多元依赖性的多尺度费雪独立性检验。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2022-09-01 Epub Date: 2022-02-21 DOI: 10.1093/biomet/asac013
S Gorsky, L Ma

Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.

识别多元数据中的依赖性是一项常见的推理任务,在许多应用中都会出现。然而,现有的非参数独立性检验通常需要的计算量至少与样本量成二次方关系,因此很难在样本量巨大的情况下应用。此外,在有限样本量的情况下,通常需要重新采样来评估所得检验统计量的统计意义,这进一步加重了计算负担。我们引入了一种可扩展、无需重采样的方法来测试两个随机向量之间的独立性,方法是将任务分解为对通过对样本空间进行从粗到细的顺序离散化而构建的 2 × 2 或然表集合进行简单的单变量独立性测试,从而将推理任务转化为一个多重测试问题,该问题的完成复杂度与样本量几乎呈线性关系。为了解决维度不断增加的问题,我们引入了一种从粗到细的顺序自适应程序,该程序利用了依赖结构的空间特征。我们推导出一种有限样本理论,保证了我们的自适应程序在任何给定样本量下的推论有效性。我们证明,我们的方法可以在任何样本量下实现对测试程序水平的有力控制,而无需重采样或渐近逼近,并建立了其大样本一致性。我们通过大量的模拟研究证明,与现有方法相比,我们的方法在计算上具有很大的优势,同时在各种依赖性情况下都能获得强大的统计能力,并说明了如何利用其分而治之的性质,不仅测试独立性,而且学习潜在依赖性的性质。最后,我们通过分析流式细胞仪实验的数据集演示了我们方法的使用。
{"title":"Multi-scale Fisher's independence test for multivariate dependence.","authors":"S Gorsky, L Ma","doi":"10.1093/biomet/asac013","DOIUrl":"10.1093/biomet/asac013","url":null,"abstract":"<p><p>Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40490055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized infinite factorization models. 广义无限分解模型。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-09-01 Epub Date: 2022-01-19 DOI: 10.1093/biomet/asab056
L Schiavon, A Canale, D B Dunson

Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (1) lack of careful consideration of the within component sparsity structure; and (2) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.

分解模型用一组更简单的对象来表示感兴趣的统计对象。例如,矩阵或张量可以表示为秩一分量的和。然而,在实践中,推断不同组件的相对影响以及组件的数量可能是具有挑战性的。一个流行的想法是包含无限多个分量,它们的影响随分量指数递减。现有方法的两个局限性促使了本文的研究:(1)缺乏对组件内部稀疏结构的仔细考虑;(2)不能容纳分组变量和其他不可交换结构。我们提出了一类一般的无限分解模型来解决这些限制。本文提供了理论支持,在模拟研究中取得了实际成果,并讨论了以模拟鸟类物种发生为重点的生态学应用。
{"title":"Generalized infinite factorization models.","authors":"L Schiavon,&nbsp;A Canale,&nbsp;D B Dunson","doi":"10.1093/biomet/asab056","DOIUrl":"https://doi.org/10.1093/biomet/asab056","url":null,"abstract":"<p><p>Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (1) lack of careful consideration of the within component sparsity structure; and (2) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9469809/pdf/nihms-1815813.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40358086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Searching for robust associations with a multi-environment knockoff filter. 利用多环境山寨过滤器搜索稳健关联。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2022-09-01 Epub Date: 2021-11-02 DOI: 10.1093/biomet/asab055
S Li, M Sesia, Y Romano, E Candès, C Sabatti

This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.

本文开发了一种基于 X 模型山寨版的方法,用于寻找跨环境一致的条件关联,同时控制误发现率。提出这个问题的动机是,大型数据集可能包含许多在统计上有意义但却具有误导性的关联,因为它们是由混杂因素或抽样缺陷引起的。然而,在不同条件下复制的关联可能更有趣。事实上,即使条件性关联不成立,有时一致性也能证明因果推论是成立的。虽然提出的方法适用范围很广,但本文强调了它与全基因组关联研究的相关性,在这种研究中,不同血统人群之间的稳健性可减轻由于未测量变异引起的混杂。本文通过对英国生物库数据的模拟和应用,证明了这种方法的有效性。
{"title":"Searching for robust associations with a multi-environment knockoff filter.","authors":"S Li, M Sesia, Y Romano, E Candès, C Sabatti","doi":"10.1093/biomet/asab055","DOIUrl":"10.1093/biomet/asab055","url":null,"abstract":"<p><p>This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11022501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60702131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1