首页 > 最新文献

Biometrika最新文献

英文 中文
A subsampling perspective for extending the validity of state-of-the-art bootstraps in the frequency domain 在频域扩展最先进自举的有效性的子采样视角
2区 数学 Q2 BIOLOGY Pub Date : 2023-01-30 DOI: 10.1093/biomet/asad006
Haihan Yu, Mark S Kaiser, Daniel J Nordman
Summary Bootstrapping spectral mean statistics has been a notoriously difficult problem over the past 25 years. Many frequency domain bootstraps are valid only for certain time series structures, e.g., linear processes, or for special types of statistics, i.e., ratio statistics, because such bootstraps fail to capture the limiting variance of spectral statistics in general settings. We address this issue with a different form of resampling, namely, subsampling. While not considered previously, subsampling provides consistent variance estimation under much weaker conditions than any existing bootstrap in the frequency domain. Mixing is not used, as is often standard with subsampling. Rather, subsampling can be generally justified under the same conditions needed for original spectral mean statistics to have distributional limits in the first place. This result has impacts for other bootstrap methods. Subsampling then applies to extending the validity of recent state-of-the-art bootstraps in the frequency domain. We nontrivially link subsampling to such bootstraps, which broadens their range, as moment and block assumptions needed for these are cut by more than half. Essentially, state-of-the-art bootstraps then require no more stringent assumptions than those needed for a target limit distribution to exist, which is unusual in the bootstrap world. We also close a gap in the theory of subsampling for time series with distributional approximations, in addition to variance estimation, for frequency domain statistics.
在过去的25年中,自举谱均值统计一直是一个非常困难的问题。许多频域自举仅对某些时间序列结构有效,例如线性过程,或对特殊类型的统计有效,例如比率统计,因为此类自举无法捕获一般设置下谱统计的极限方差。我们用另一种形式的重采样来解决这个问题,即子采样。虽然以前没有考虑过,但子采样在比任何现有的频域自举都弱得多的条件下提供一致的方差估计。不使用混合,这通常是标准的子采样。相反,在原始谱均值统计量首先具有分布限制所需的相同条件下,通常可以证明子抽样是合理的。这个结果对其他bootstrap方法有影响。然后,子采样应用于扩展最新的最先进的自举在频域的有效性。我们非平凡地将子采样与这样的自举联系起来,这扩大了它们的范围,因为这些所需的矩和块假设减少了一半以上。从本质上讲,最先进的自举方法不需要比目标极限分布存在所需的假设更严格的假设,这在自举方法世界中是不寻常的。除了方差估计外,我们还在频域统计量的分布近似时间序列的子抽样理论中缩小了差距。
{"title":"A subsampling perspective for extending the validity of state-of-the-art bootstraps in the frequency domain","authors":"Haihan Yu, Mark S Kaiser, Daniel J Nordman","doi":"10.1093/biomet/asad006","DOIUrl":"https://doi.org/10.1093/biomet/asad006","url":null,"abstract":"Summary Bootstrapping spectral mean statistics has been a notoriously difficult problem over the past 25 years. Many frequency domain bootstraps are valid only for certain time series structures, e.g., linear processes, or for special types of statistics, i.e., ratio statistics, because such bootstraps fail to capture the limiting variance of spectral statistics in general settings. We address this issue with a different form of resampling, namely, subsampling. While not considered previously, subsampling provides consistent variance estimation under much weaker conditions than any existing bootstrap in the frequency domain. Mixing is not used, as is often standard with subsampling. Rather, subsampling can be generally justified under the same conditions needed for original spectral mean statistics to have distributional limits in the first place. This result has impacts for other bootstrap methods. Subsampling then applies to extending the validity of recent state-of-the-art bootstraps in the frequency domain. We nontrivially link subsampling to such bootstraps, which broadens their range, as moment and block assumptions needed for these are cut by more than half. Essentially, state-of-the-art bootstraps then require no more stringent assumptions than those needed for a target limit distribution to exist, which is unusual in the bootstrap world. We also close a gap in the theory of subsampling for time series with distributional approximations, in addition to variance estimation, for frequency domain statistics.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135554424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Optimal row-column designs 更正:最佳行-列设计
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-01-27 DOI: 10.1093/biomet/asad003
{"title":"Correction to: Optimal row-column designs","authors":"","doi":"10.1093/biomet/asad003","DOIUrl":"https://doi.org/10.1093/biomet/asad003","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44926470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Dimensional Analysis of Variance in Multivariate Linear Regression 多元线性回归的高维方差分析
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-01-10 DOI: 10.1093/biomet/asad001
Zhipeng Lou, Xianyang Zhang, Weichi Wu
In this paper, we develop a systematic theory for high dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new U type test statistic to test linear hypotheses and establish a high dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be applied to deal with the classical one-way multivariate analysis of variance and the nonparametric one-way multivariate analysis of variance in high dimensions. To implement the test procedure, we introduce a sample-splitting based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings.
在本文中,我们发展了多元线性回归中高维方差分析的系统理论,其中系数的维数和数量都可以随着样本量的增加而增加。我们提出了一个新的U型检验统计量来检验线性假设,并在相当温和的矩假设下建立了高维高斯近似结果。我们的一般框架和理论可以应用于处理经典的单向多元方差分析和高维的非参数单向多元方差分析。为了实现测试程序,我们引入了一个基于样本分裂的误差协方差第二矩估计器,并讨论了它的性质。仿真研究表明,我们提出的测试在各种设置下都优于现有的一些测试。
{"title":"High Dimensional Analysis of Variance in Multivariate Linear Regression","authors":"Zhipeng Lou, Xianyang Zhang, Weichi Wu","doi":"10.1093/biomet/asad001","DOIUrl":"https://doi.org/10.1093/biomet/asad001","url":null,"abstract":"\u0000 In this paper, we develop a systematic theory for high dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new U type test statistic to test linear hypotheses and establish a high dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be applied to deal with the classical one-way multivariate analysis of variance and the nonparametric one-way multivariate analysis of variance in high dimensions. To implement the test procedure, we introduce a sample-splitting based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47351243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An instrumental variable method for point processes: generalized Wald estimation based on deconvolution 点过程的工具变量方法:基于反卷积的广义Wald估计
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-01-09 DOI: 10.1093/biomet/asad005
Zhichao Jiang, Shizhe Chen, Peng Ding
Point processes are probabilistic tools for modelling event data. While there exists a fast-growing literature studying the relationships between point processes, it remains unexplored how such relationships connect to causal effects. In the presence of unmeasured confounders, parameters from point process models do not necessarily have causal interpretations. We propose an instrumental variable method for causal inference with point process treatment and outcome. We define causal quantities based on potential outcomes and establish nonparametric identification results with a binary instrumental variable. We extend the traditional Wald estimation to deal with point process treatment and outcome, showing that it should be performed after a Fourier transform of the intention-to-treat effects on the treatment and outcome and thus takes the form of deconvolution. We term this generalized Wald estimation and propose an estimation strategy based on well-established deconvolution methods.
点过程是用于建模事件数据的概率工具。虽然研究点过程之间关系的文献数量迅速增长,但这种关系如何与因果效应联系起来仍有待探索。在存在未测量的混杂因素的情况下,来自点过程模型的参数不一定具有因果解释。我们提出了一种具有点过程处理和结果的因果推理工具变量方法。我们基于潜在结果定义因果量,并用二元工具变量建立非参数识别结果。我们将传统的Wald估计扩展到处理点过程处理和结果,表明它应该在对意图处理对处理和结果的影响进行傅立叶变换后进行,因此采取了反褶积的形式。我们提出了这种广义Wald估计,并提出了一种基于公认的反褶积方法的估计策略。
{"title":"An instrumental variable method for point processes: generalized Wald estimation based on deconvolution","authors":"Zhichao Jiang, Shizhe Chen, Peng Ding","doi":"10.1093/biomet/asad005","DOIUrl":"https://doi.org/10.1093/biomet/asad005","url":null,"abstract":"\u0000 Point processes are probabilistic tools for modelling event data. While there exists a fast-growing literature studying the relationships between point processes, it remains unexplored how such relationships connect to causal effects. In the presence of unmeasured confounders, parameters from point process models do not necessarily have causal interpretations. We propose an instrumental variable method for causal inference with point process treatment and outcome. We define causal quantities based on potential outcomes and establish nonparametric identification results with a binary instrumental variable. We extend the traditional Wald estimation to deal with point process treatment and outcome, showing that it should be performed after a Fourier transform of the intention-to-treat effects on the treatment and outcome and thus takes the form of deconvolution. We term this generalized Wald estimation and propose an estimation strategy based on well-established deconvolution methods.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45747425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Significance testing for canonical correlation analysis in high dimensions. 高维度典型相关分析的显著性检验。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2022-12-01 Epub Date: 2022-11-18 DOI: 10.1093/biomet/asab059
Ian W McKeague, Xin Zhang

We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.

我们根据典型相关分析的后选推理方法,考虑了检验大量随机变量集之间是否存在线性关系的问题。我们面临的挑战是,如何调整具有最大样本相关性线性组合的变量子集的选择。为此,我们构建了一个稳定的一步估计器,用于估计在预先指定的心数变量子集上最大化的典型相关性的欧几里德正态。结果表明,只要变量的维数不随着样本量的增加而过快增长,这个估计器对其目标参数是一致的,而且渐近正态。我们还开发了一种贪婪搜索算法来精确计算该估计器,从而得到一个计算简单的全局零假设综合测试,即任何具有预先指定的万有引力的变量子集之间不存在线性关系。我们进一步开发了一个置信区间,将变量选择考虑在内。
{"title":"Significance testing for canonical correlation analysis in high dimensions.","authors":"Ian W McKeague, Xin Zhang","doi":"10.1093/biomet/asab059","DOIUrl":"10.1093/biomet/asab059","url":null,"abstract":"<p><p>We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"109 4","pages":"1067-1083"},"PeriodicalIF":2.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9857302/pdf/nihms-1771870.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10613294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional hybrid factor regression model for handling heterogeneity in imaging studies. 处理影像学研究异质性的功能混合因子回归模型。
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2022-12-01 DOI: 10.1093/biomet/asac007
C Huang, H Zhu

This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer's disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, design, protocols or other hidden factors, and it has posed major challenges in integrative analysis of imaging data collected from multicentres or multistudies. We propose both estimation and inference procedures for estimating unknown parameters and detecting unknown factors under our new model. The asymptotic properties of both estimation and inference procedures are systematically investigated. The finite-sample performance of our proposed procedures is assessed by using Monte Carlo simulations and a real data example on hippocampal surface data from the Alzheimer's disease study.

本文开发了一个功能混合因素回归建模框架,以处理许多大规模影像学研究的异质性,如阿尔茨海默病神经影像学倡议研究。尽管这些影像学研究取得了许多成功,但这种异质性可能是由研究环境、人群、设计、方案或其他隐藏因素的差异造成的,这给多中心或多研究收集的影像学数据的综合分析带来了重大挑战。我们提出了在新模型下估计未知参数和检测未知因素的估计和推理程序。系统地研究了估计过程和推理过程的渐近性质。我们提出的程序的有限样本性能通过使用蒙特卡罗模拟和阿尔茨海默病研究海马表面数据的真实数据示例进行评估。
{"title":"Functional hybrid factor regression model for handling heterogeneity in imaging studies.","authors":"C Huang, H Zhu","doi":"10.1093/biomet/asac007","DOIUrl":"10.1093/biomet/asac007","url":null,"abstract":"<p><p>This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer's disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, design, protocols or other hidden factors, and it has posed major challenges in integrative analysis of imaging data collected from multicentres or multistudies. We propose both estimation and inference procedures for estimating unknown parameters and detecting unknown factors under our new model. The asymptotic properties of both estimation and inference procedures are systematically investigated. The finite-sample performance of our proposed procedures is assessed by using Monte Carlo simulations and a real data example on hippocampal surface data from the Alzheimer's disease study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"109 4","pages":"1133-1148"},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10749215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A proximal distance algorithm for likelihood-based sparse covariance estimation. 基于似然法的稀疏协方差估计的近距离算法。
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2022-12-01 Epub Date: 2022-02-16 DOI: 10.1093/biomet/asac011
Jason Xu, Kenneth Lange

This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties, and enables optimization of the resulting nonconvex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive-definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods across several metrics, for a suite of simulated experiments. Its merits are illustrated on international migration data and a case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.

本文探讨了在无模式稀疏性假设下估计协方差矩阵的任务。与现有的基于阈值或收缩惩罚的方法不同,我们提出了一种基于似然法的方法,该方法对协方差估计到对称稀疏集的距离进行正则化处理。这种方法避免了更常见的规范惩罚所引起的不必要的收缩,并通过解决一系列平滑、无约束的子问题来优化由此产生的非凸目标。这些子问题是通过大化-最小化原理的近距离版本生成和求解的。由此产生的算法执行迅速,能从容应对参数数量超过案例数量的情况,产生正有限解,并具有理想的收敛特性。经验表明,在一系列模拟实验中,我们的方法在多个指标上都优于其他竞争方法。我们通过国际移民数据和流式细胞仪案例研究说明了这种方法的优点。我们的研究结果表明,细胞信号数据的边际依赖网络和条件依赖网络比以前得出的结论更为相似。
{"title":"A proximal distance algorithm for likelihood-based sparse covariance estimation.","authors":"Jason Xu, Kenneth Lange","doi":"10.1093/biomet/asac011","DOIUrl":"10.1093/biomet/asac011","url":null,"abstract":"<p><p>This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties, and enables optimization of the resulting nonconvex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive-definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods across several metrics, for a suite of simulated experiments. Its merits are illustrated on international migration data and a case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"1 1","pages":"1047-1066"},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10716840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Graphical Gaussian Process Models for Highly Multivariate Spatial Data. 高多元空间数据的图形高斯过程模型。
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2022-12-01 DOI: 10.1093/biomet/asab061
Debangan Dey, Abhirup Datta, Sudipto Banerjee

For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a "curse of dimensionality" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate "Graphical Gaussian Processes" using a general construction called "stitching" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Matérn family of functions, stitching yields a multivariate GP whose univariate components are Matérn GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling.

对于多元空间高斯过程(GP)模型,传统的交叉协方差函数规范没有利用关系变量间图来确保变量之间的过程级条件独立性。这是不可取的,特别是对于高度多元的设置,其中流行的交叉协方差函数(如多元mat n)遭受“维数诅咒”,因为参数和浮点运算的数量分别以二次和三次顺序在变量数量上按比例增加。我们提出了一类多变量“图形高斯过程”,使用称为“拼接”的一般构造,从图中制作交叉协方差函数,并确保变量之间的过程级条件独立性。对于mat2013.2013.10函数族,拼接产生一个多变量GP,其单变量分量为mat2013.2013.10 GP,并且符合图形模型指定的过程级条件独立性。对于高度多元的设置和可分解的图形模型,拼接提供了大量的计算增益和参数维数减少。我们通过模拟实例和空气污染建模的应用,演示了图形化mat rn GP对高度多元空间数据联合建模的效用。
{"title":"Graphical Gaussian Process Models for Highly Multivariate Spatial Data.","authors":"Debangan Dey,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.1093/biomet/asab061","DOIUrl":"https://doi.org/10.1093/biomet/asab061","url":null,"abstract":"<p><p>For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a \"curse of dimensionality\" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate \"Graphical Gaussian Processes\" using a general construction called \"stitching\" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Matérn family of functions, stitching yields a multivariate GP whose univariate components are Matérn GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"109 4","pages":"993-1014"},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838617/pdf/nihms-1786615.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9104899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Thresholded Graphical Lasso Adjusts for Latent Variables 阈值图形套索调整潜在变量
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2022-11-10 DOI: 10.1093/biomet/asac060
Minjie Wang, Genevera I. Allen
Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program to estimate a sparse graph plus low-rank term that adjusts for latent variables; but, this approach poses challenges from both a computational and statistical perspective. We propose an alternative and incredibly simple solution: apply a hard thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, we show that thresholding the graphical lasso is graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. We also extend results to thresholded neighbourhood selection and CLIME estimators as well. We show that our simple thresholded graph estimators enjoy stronger empirical results than existing approaches for the latent variable graphical model problem and conclude with a neuroscience case study to estimate functional neural connections.
存在潜在变量的高斯图模型的结构学习一直是一个具有挑战性的问题。Chandrasekaran等人(2012)提出了一种凸程序来估计稀疏图和低秩项,该项可以调整潜在变量;但是,这种方法从计算和统计的角度都提出了挑战。我们提出了另一种非常简单的解决方案:对现有的图选择方法应用硬阈值算子。概念上简单,计算上有吸引力,我们证明了阈值化的图形套是在潜在变量存在下的图形选择一致,在更简单的最小边缘强度条件下,以提高的统计率。我们还将结果扩展到阈值邻居选择和CLIME估计器。我们表明,我们的简单阈值图估计器在潜在变量图模型问题上比现有方法具有更强的经验结果,并以神经科学案例研究来估计功能性神经连接。
{"title":"Thresholded Graphical Lasso Adjusts for Latent Variables","authors":"Minjie Wang, Genevera I. Allen","doi":"10.1093/biomet/asac060","DOIUrl":"https://doi.org/10.1093/biomet/asac060","url":null,"abstract":"Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program to estimate a sparse graph plus low-rank term that adjusts for latent variables; but, this approach poses challenges from both a computational and statistical perspective. We propose an alternative and incredibly simple solution: apply a hard thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, we show that thresholding the graphical lasso is graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. We also extend results to thresholded neighbourhood selection and CLIME estimators as well. We show that our simple thresholded graph estimators enjoy stronger empirical results than existing approaches for the latent variable graphical model problem and conclude with a neuroscience case study to estimate functional neural connections.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48628621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Additive Models for Symmetric Positive-Definite Matrices and Lie Groups 对称正定矩阵与李群的加性模型
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2022-09-29 DOI: 10.1093/biomet/asac055
Z. Lin, H. Müller, B. U. Park
We propose and investigate an additive regression model for symmetric positive-definite matrix valued responses and multiple scalar predictors. The model exploits the abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established and numerical studies show that the proposed model enjoys good numerical performance and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.
我们提出并研究了对称正定矩阵值响应和多个标量预测因子的加性回归模型。该模型利用了从对称正定矩阵的log Cholesky和log Euclidean框架中继承的阿贝尔群结构,并自然扩展到一般阿贝尔李群。所提出的可加性模型被证明连接到切线空间上的可加模型。这种联系不仅需要一种有效的算法来估计分量函数,而且允许将所提出的加性模型推广到一般的黎曼流形。建立了估计分量函数的最优渐近收敛速度和正态性,数值研究表明,当有多个预测因子时,该模型具有良好的数值性能,不受维数诅咒的影响。通过对脑扩散张量成像数据的分析,证明了该模型的实用价值。
{"title":"Additive Models for Symmetric Positive-Definite Matrices and Lie Groups","authors":"Z. Lin, H. Müller, B. U. Park","doi":"10.1093/biomet/asac055","DOIUrl":"https://doi.org/10.1093/biomet/asac055","url":null,"abstract":"\u0000 We propose and investigate an additive regression model for symmetric positive-definite matrix valued responses and multiple scalar predictors. The model exploits the abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established and numerical studies show that the proposed model enjoys good numerical performance and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47707225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1