关于分离良好高斯分布的学习混合

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) Pub Date : 2017-10-31 DOI:10.1109/FOCS.2017.17

O. Regev, Aravindan Vijayaraghavan

{"title":"关于分离良好高斯分布的学习混合","authors":"O. Regev, Aravindan Vijayaraghavan","doi":"10.1109/FOCS.2017.17","DOIUrl":null,"url":null,"abstract":"We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu_1,...,mu_k in R^d, and the goal is to estimate the means up to accuracy δ using poly(k,d, 1/δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation Ω(√log k), poly(k,d,1/δ) samples suffice. Notice that the bound on the separation is independent of δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1δ) outputs estimates of the means up to arbitrarily good accuracy δ assuming the separation between the means is Ωmin √(log k),√d) (independently of δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only Ω(√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.","PeriodicalId":311592,"journal":{"name":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"On Learning Mixtures of Well-Separated Gaussians\",\"authors\":\"O. Regev, Aravindan Vijayaraghavan\",\"doi\":\"10.1109/FOCS.2017.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu_1,...,mu_k in R^d, and the goal is to estimate the means up to accuracy δ using poly(k,d, 1/δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation Ω(√log k), poly(k,d,1/δ) samples suffice. Notice that the bound on the separation is independent of δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1δ) outputs estimates of the means up to arbitrarily good accuracy δ assuming the separation between the means is Ωmin √(log k),√d) (independently of δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only Ω(√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.\",\"PeriodicalId\":311592,\"journal\":{\"name\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2017.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2017.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

摘要

我们考虑了当混合物的成分被很好地分离时，有效地学习大量球形高斯混合物的问题。在这个问题的最基本形式中，我们从k个平均为mu_1，…的标准球面高斯函数的均匀混合物中得到样本。，mu_k在R^d中，目标是估计达到精度的均值δ使用poly(k,d, 1/δ)样品。在这项工作中，我们研究了以下问题:解决这个任务的方法之间需要的最小分离是什么?Vempala和Wang [JCSS 2004]提出的最著名的算法需要大约min{k,d}^{1/4}的分离。另一方面，Moitra和Valiant [FOCS 2010]表明，当分离为0(1)时，需要的样本数量呈指数级增长。通过显示以下结果，我们解决了这两个界限之间的显著差距。我们表明，在分离为0 (√log k)的情况下，需要超多项式的多个样本。事实上，即使在d=O(log k)维中随机选取高斯分布的k个均值时，这一点也成立。我们表明，在分离Ω(√log k)的情况下，poly(k,d,1/δ)样本就足够了。注意，分隔的边界与δ无关。该结果基于一种新的高效精度提升算法，该算法将真实均值的粗略估计作为输入，并在时间(和样本)poly(k,d, 1δ)中输出均值的估计，达到任意好的精度δ假设均值之间的距离为Ωmin √(log k)，√d)(独立于δ)。该算法的思想是迭代求解一个对角占优的非线性方程组。我们还(1)提出了d=O(1)个维度的计算效率高的算法，只有Ω(√{d})分离，并且(2)将我们的结果扩展到组件可能具有不同权重和方差的情况。这些结果结合在一起，本质上表征了学习k个具有多项式样本的球状高斯混合所需的组件之间的最佳分离顺序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On Learning Mixtures of Well-Separated Gaussians

We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu_1,...,mu_k in R^d, and the goal is to estimate the means up to accuracy δ using poly(k,d, 1/δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation Ω(√log k), poly(k,d,1/δ) samples suffice. Notice that the bound on the separation is independent of δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1δ) outputs estimates of the means up to arbitrarily good accuracy δ assuming the separation between the means is Ωmin √(log k),√d) (independently of δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only Ω(√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助