对具有未知协方差的高斯进行分解

arXiv - STAT - Methodology Pub Date : 2024-09-17 DOI:arxiv-2409.11497

Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten

{"title":"对具有未知协方差的高斯进行分解","authors":"Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten","doi":"arxiv-2409.11497","DOIUrl":null,"url":null,"abstract":"Common workflows in machine learning and statistics rely on the ability to\npartition the information in a data set into independent portions. Recent work\nhas shown that this may be possible even when conventional sample splitting is\nnot (e.g., when the number of samples $n=1$, or when observations are not\nindependent and identically distributed). However, the approaches that are\ncurrently available to decompose multivariate Gaussian data require knowledge\nof the covariance matrix. In many important problems (such as in spatial or\nlongitudinal data analysis, and graphical modeling), the covariance matrix may\nbe unknown and even of primary interest. Thus, in this work we develop new\napproaches to decompose Gaussians with unknown covariance. First, we present a\ngeneral algorithm that encompasses all previous decomposition approaches for\nGaussian data as special cases, and can further handle the case of an unknown\ncovariance. It yields a new and more flexible alternative to sample splitting\nwhen $n>1$. When $n=1$, we prove that it is impossible to partition the\ninformation in a multivariate Gaussian into independent portions without\nknowing the covariance matrix. Thus, we use the general algorithm to decompose\na single multivariate Gaussian with unknown covariance into dependent parts\nwith tractable conditional distributions, and demonstrate their use for\ninference and validation. The proposed decomposition strategy extends naturally\nto Gaussian processes. In simulation and on electroencephalography data, we\napply these decompositions to the tasks of model selection and post-selection\ninference in settings where alternative strategies are unavailable.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decomposing Gaussians with Unknown Covariance\",\"authors\":\"Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten\",\"doi\":\"arxiv-2409.11497\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Common workflows in machine learning and statistics rely on the ability to\\npartition the information in a data set into independent portions. Recent work\\nhas shown that this may be possible even when conventional sample splitting is\\nnot (e.g., when the number of samples $n=1$, or when observations are not\\nindependent and identically distributed). However, the approaches that are\\ncurrently available to decompose multivariate Gaussian data require knowledge\\nof the covariance matrix. In many important problems (such as in spatial or\\nlongitudinal data analysis, and graphical modeling), the covariance matrix may\\nbe unknown and even of primary interest. Thus, in this work we develop new\\napproaches to decompose Gaussians with unknown covariance. First, we present a\\ngeneral algorithm that encompasses all previous decomposition approaches for\\nGaussian data as special cases, and can further handle the case of an unknown\\ncovariance. It yields a new and more flexible alternative to sample splitting\\nwhen $n>1$. When $n=1$, we prove that it is impossible to partition the\\ninformation in a multivariate Gaussian into independent portions without\\nknowing the covariance matrix. Thus, we use the general algorithm to decompose\\na single multivariate Gaussian with unknown covariance into dependent parts\\nwith tractable conditional distributions, and demonstrate their use for\\ninference and validation. The proposed decomposition strategy extends naturally\\nto Gaussian processes. In simulation and on electroencephalography data, we\\napply these decompositions to the tasks of model selection and post-selection\\ninference in settings where alternative strategies are unavailable.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"77 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11497\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习和统计学中的常见工作流程依赖于将数据集中的信息分割成独立部分的能力。最近的研究表明，即使在传统的样本分割方法无法实现的情况下（例如，当样本数 $n=1$ 时，或当观测值不是独立且同分布时），这种方法也是可行的。然而，目前可用来分解多变量高斯数据的方法需要了解协方差矩阵。在许多重要问题中（如空间或纵向数据分析以及图形建模），协方差矩阵可能是未知的，甚至是最重要的。因此，在这项工作中，我们开发了分解具有未知协方差的高斯的新方法。首先，我们提出了一种通用算法，它包含了以往所有高斯数据分解方法的特例，并能进一步处理未知协方差的情况。当 $n>1$ 时，它产生了一种新的、更灵活的样本分割替代方法。当 $n=1$ 时，我们证明不可能在不知道协方差矩阵的情况下将多元高斯中的信息分割成独立的部分。因此，我们使用一般算法将具有未知协方差的单个多元高斯分解为具有可控条件分布的从属部分，并演示了它们在推断和验证中的应用。所提出的分解策略可以自然地扩展到高斯过程。在仿真和脑电图数据中，我们将这些分解应用于模型选择和后选择推断任务，而这些任务是在没有替代策略的情况下完成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Decomposing Gaussians with Unknown Covariance

Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not (e.g., when the number of samples $n=1$, or when observations are not independent and identically distributed). However, the approaches that are currently available to decompose multivariate Gaussian data require knowledge of the covariance matrix. In many important problems (such as in spatial or longitudinal data analysis, and graphical modeling), the covariance matrix may be unknown and even of primary interest. Thus, in this work we develop new approaches to decompose Gaussians with unknown covariance. First, we present a general algorithm that encompasses all previous decomposition approaches for Gaussian data as special cases, and can further handle the case of an unknown covariance. It yields a new and more flexible alternative to sample splitting when $n>1$. When $n=1$, we prove that it is impossible to partition the information in a multivariate Gaussian into independent portions without knowing the covariance matrix. Thus, we use the general algorithm to decompose a single multivariate Gaussian with unknown covariance into dependent parts with tractable conditional distributions, and demonstrate their use for inference and validation. The proposed decomposition strategy extends naturally to Gaussian processes. In simulation and on electroencephalography data, we apply these decompositions to the tasks of model selection and post-selection inference in settings where alternative strategies are unavailable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Methodology

自引率

0.00%

发文量