We consider the following fundamental problems: Constructing k-independent hash functions with a space-time tradeoff close to Siegel's lower bound. Constructing representations of unbalanced expander graphs having small size and allowing fast computation of the neighbor function. It is not hard to show that these problems are intimately connected in the sense that a good solution to one of them leads to a good solution to the other one. In this paper we exploit this connection to present efficient, recursive constructions of k-independent hash functions (and hence expanders with a small representation). While the previously most efficient construction (Thorup, FOCS 2013) needed time quasipolynomial in Siegel's lower bound, our time bound is just a logarithmic factor from the lower bound.
{"title":"From Independence to Expansion and Back Again","authors":"Tobias Christiani, R. Pagh, M. Thorup","doi":"10.1145/2746539.2746620","DOIUrl":"https://doi.org/10.1145/2746539.2746620","url":null,"abstract":"We consider the following fundamental problems: Constructing k-independent hash functions with a space-time tradeoff close to Siegel's lower bound. Constructing representations of unbalanced expander graphs having small size and allowing fast computation of the neighbor function. It is not hard to show that these problems are intimately connected in the sense that a good solution to one of them leads to a good solution to the other one. In this paper we exploit this connection to present efficient, recursive constructions of k-independent hash functions (and hence expanders with a small representation). While the previously most efficient construction (Thorup, FOCS 2013) needed time quasipolynomial in Siegel's lower bound, our time bound is just a logarithmic factor from the lower bound.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84442655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We show that every language in NP has a PCP verifier that tosses O(log n) random coins, has perfect completeness, and a soundness error of at most 1/poly(n), while making O(poly log log n) queries into a proof over an alphabet of size at most n1/poly log log n. Previous constructions that obtain 1/poly(n) soundness error used either poly log n queries or an exponential alphabet, i.e. of size 2nc for some c> 0. Our result is an exponential improvement in both parameters simultaneously. Our result can be phrased as polynomial-gap hardness for approximate CSPs with arity poly log log n and alphabet size n1/poly log n. The ultimate goal, in this direction, would be to prove polynomial hardness for CSPs with constant arity and polynomial alphabet size (aka the sliding scale conjecture for inverse polynomial soundness error). Our construction is based on a modular generalization of previous PCP constructions in this parameter regime, which involves a composition theorem that uses an extra 'consistency' query but maintains the inverse polynomial relation between the soundness error and the alphabet size. Our main technical/conceptual contribution is a new notion of soundness, which we refer to as distributional soundness, that replaces the previous notion of "list decoding soundness", and allows us to invoke composition a super-constant number of times without incurring a blow-up in the soundness error.
{"title":"Polynomially Low Error PCPs with polyloglog n Queries via Modular Composition","authors":"Irit Dinur, P. Harsha, Guy Kindler","doi":"10.1145/2746539.2746630","DOIUrl":"https://doi.org/10.1145/2746539.2746630","url":null,"abstract":"We show that every language in NP has a PCP verifier that tosses O(log n) random coins, has perfect completeness, and a soundness error of at most 1/poly(n), while making O(poly log log n) queries into a proof over an alphabet of size at most n1/poly log log n. Previous constructions that obtain 1/poly(n) soundness error used either poly log n queries or an exponential alphabet, i.e. of size 2nc for some c> 0. Our result is an exponential improvement in both parameters simultaneously. Our result can be phrased as polynomial-gap hardness for approximate CSPs with arity poly log log n and alphabet size n1/poly log n. The ultimate goal, in this direction, would be to prove polynomial hardness for CSPs with constant arity and polynomial alphabet size (aka the sliding scale conjecture for inverse polynomial soundness error). Our construction is based on a modular generalization of previous PCP constructions in this parameter regime, which involves a composition theorem that uses an extra 'consistency' query but maintains the inverse polynomial relation between the soundness error and the alphabet size. Our main technical/conceptual contribution is a new notion of soundness, which we refer to as distributional soundness, that replaces the previous notion of \"list decoding soundness\", and allows us to invoke composition a super-constant number of times without incurring a blow-up in the soundness error.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79688281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given an undirected graph with edge costs and node weights, the minimum bisection problem asks for a partition of the nodes into two parts of equal weight such that the sum of edge costs between the parts is minimized. We give a polynomial time bicriteria approximation scheme for bisection on planar graphs. Specifically, let W be the total weight of all nodes in a planar graph G. For any constant ε > 0, our algorithm outputs a bipartition of the nodes such that each part weighs at most W/2 + ε and the total cost of edges crossing the partition is at most (1+ε) times the total cost of the optimal bisection. The previously best known approximation for planar minimum bisection, even with unit node weights, was ~O(log n). Our algorithm actually solves a more general problem where the input may include a target weight for the smaller side of the bipartition.
{"title":"A Polynomial-time Bicriteria Approximation Scheme for Planar Bisection","authors":"K. Fox, P. Klein, S. Mozes","doi":"10.1145/2746539.2746564","DOIUrl":"https://doi.org/10.1145/2746539.2746564","url":null,"abstract":"Given an undirected graph with edge costs and node weights, the minimum bisection problem asks for a partition of the nodes into two parts of equal weight such that the sum of edge costs between the parts is minimized. We give a polynomial time bicriteria approximation scheme for bisection on planar graphs. Specifically, let W be the total weight of all nodes in a planar graph G. For any constant ε > 0, our algorithm outputs a bipartition of the nodes such that each part weighs at most W/2 + ε and the total cost of edges crossing the partition is at most (1+ε) times the total cost of the optimal bisection. The previously best known approximation for planar minimum bisection, even with unit node weights, was ~O(log n). Our algorithm actually solves a more general problem where the input may include a target weight for the smaller side of the bipartition.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87687089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the maximum independent set problem on graphs with maximum degree d. We show that the integrality gap of the Lovasz Theta function-based SDP has an integrality gap of O~(d/log3/2 d). This improves on the previous best result of O~(d/log d), and narrows the gap of this basic SDP to the integrality gap of O~(d/log2 d) recently shown for stronger SDPs, namely those obtained using poly log(d) levels of the SA+ semidefinite hierarchy. The improvement comes from an improved Ramsey-theoretic bound on the independence number of Kr-free graphs for large values of r. We also show how to obtain an algorithmic version of the above-mentioned SAplus-based integrality gap result, via a coloring algorithm of Johansson. The resulting approximation guarantee of O~(d/log2 d) matches the best unique-games-based hardness result up to lower-order poly (log log d) factors.
{"title":"On the Lovász Theta function for Independent Sets in Sparse Graphs","authors":"N. Bansal, Anupam Gupta, Guru Guruganesh","doi":"10.1145/2746539.2746607","DOIUrl":"https://doi.org/10.1145/2746539.2746607","url":null,"abstract":"We consider the maximum independent set problem on graphs with maximum degree d. We show that the integrality gap of the Lovasz Theta function-based SDP has an integrality gap of O~(d/log3/2 d). This improves on the previous best result of O~(d/log d), and narrows the gap of this basic SDP to the integrality gap of O~(d/log2 d) recently shown for stronger SDPs, namely those obtained using poly log(d) levels of the SA+ semidefinite hierarchy. The improvement comes from an improved Ramsey-theoretic bound on the independence number of Kr-free graphs for large values of r. We also show how to obtain an algorithmic version of the above-mentioned SAplus-based integrality gap result, via a coloring algorithm of Johansson. The resulting approximation guarantee of O~(d/log2 d) matches the best unique-games-based hardness result up to lower-order poly (log log d) factors.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83498187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called "heavy hitters") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√{log(d)/(ε2n)}). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about √[6]{log(d)/(ε2n)}), and the only known lower bound on error was Ω(1/√{n}). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.
{"title":"Local, Private, Efficient Protocols for Succinct Histograms","authors":"Raef Bassily, Adam D. Smith","doi":"10.1145/2746539.2746632","DOIUrl":"https://doi.org/10.1145/2746539.2746632","url":null,"abstract":"We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called \"heavy hitters\") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√{log(d)/(ε2n)}). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about √[6]{log(d)/(ε2n)}), and the only known lower bound on error was Ω(1/√{n}). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91087294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter ε, a d-bounded degree graph is defined to be (k, φ)-clusterable, if it can be partitioned into no more than k parts, such that the (inner) conductance of the induced subgraph on each part is at least φ and the (outer) conductance of each part is at most cd,kε4φ2, where cd,k depends only on d,k. Our main result is a sublinear algorithm with the running time ~O(√n ⋅ poly(φ,k,1/ε)) that takes as input a graph with maximum degree bounded by d, parameters k, φ, ε, and with probability at least 2/3, accepts the graph if it is (k,φ)-clusterable and rejects the graph if it is ε-far from (k, φ*)-clusterable for φ* = c'd,kφ2 ε4}/log n, where c'd,k depends only on d,k. By the lower bound of Ω(√n) on the number of queries needed for testing graph expansion, which corresponds to k=1 in our problem, our algorithm is asymptotically optimal up to polylogarithmic factors.
{"title":"Testing Cluster Structure of Graphs","authors":"A. Czumaj, Pan Peng, C. Sohler","doi":"10.1145/2746539.2746618","DOIUrl":"https://doi.org/10.1145/2746539.2746618","url":null,"abstract":"We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter ε, a d-bounded degree graph is defined to be (k, φ)-clusterable, if it can be partitioned into no more than k parts, such that the (inner) conductance of the induced subgraph on each part is at least φ and the (outer) conductance of each part is at most cd,kε4φ2, where cd,k depends only on d,k. Our main result is a sublinear algorithm with the running time ~O(√n ⋅ poly(φ,k,1/ε)) that takes as input a graph with maximum degree bounded by d, parameters k, φ, ε, and with probability at least 2/3, accepts the graph if it is (k,φ)-clusterable and rejects the graph if it is ε-far from (k, φ*)-clusterable for φ* = c'd,kφ2 ε4}/log n, where c'd,k depends only on d,k. By the lower bound of Ω(√n) on the number of queries needed for testing graph expansion, which corresponds to k=1 in our problem, our algorithm is asymptotically optimal up to polylogarithmic factors.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85959933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, mix, is a probability distribution over probability distributions p, where each such p is a probability distribution over [n] = {1,2,...,n}. When we sample from mix, we do not observe p directly, but only indirectly and in very noisy fashion, by sampling from [n] repeatedly, independently K times from the distribution p. The problem is to infer mix to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution $mix$. We bound the quality of the solution as a function of the size of the samples K and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.
{"title":"Learning Arbitrary Statistical Mixtures of Discrete Distributions","authors":"Jian Li, Y. Rabani, L. Schulman, Chaitanya Swamy","doi":"10.1145/2746539.2746584","DOIUrl":"https://doi.org/10.1145/2746539.2746584","url":null,"abstract":"We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, mix, is a probability distribution over probability distributions p, where each such p is a probability distribution over [n] = {1,2,...,n}. When we sample from mix, we do not observe p directly, but only indirectly and in very noisy fashion, by sampling from [n] repeatedly, independently K times from the distribution p. The problem is to infer mix to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution $mix$. We bound the quality of the solution as a function of the size of the samples K and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90171759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We provide the first protocol that solves Byzantine agreement with optimal early stopping (min{f+2,t+1} rounds) and optimal resilience (n>3t) using polynomial message size and computation. All previous approaches obtained sub-optimal results and used resolve rules that looked only at the immediate children in the EIG (Exponential Information Gathering) tree. At the heart of our solution are new resolve rules that look at multiple layers of the EIG tree.
{"title":"Byzantine Agreement with Optimal Early Stopping, Optimal Resilience and Polynomial Complexity","authors":"Ittai Abraham, D. Dolev","doi":"10.1145/2746539.2746581","DOIUrl":"https://doi.org/10.1145/2746539.2746581","url":null,"abstract":"We provide the first protocol that solves Byzantine agreement with optimal early stopping (min{f+2,t+1} rounds) and optimal resilience (n>3t) using polynomial message size and computation. All previous approaches obtained sub-optimal results and used resolve rules that looked only at the immediate children in the EIG (Exponential Information Gathering) tree. At the heart of our solution are new resolve rules that look at multiple layers of the EIG tree.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76471587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sayan Bhattacharya, M. Henzinger, Danupon Nanongkai, Charalampos E. Tsourakakis
While in many graph mining applications it is crucial to handle a stream of updates efficiently in terms of both time and space, not much was known about achieving such type of algorithm. In this paper we study this issue for a problem which lies at the core of many graph mining applications called densest subgraph problem. We develop an algorithm that achieves time- and space-efficiency for this problem simultaneously. It is one of the first of its kind for graph problems to the best of our knowledge. Given an input graph, the densest subgraph is the subgraph that maximizes the ratio between the number of edges and the number of nodes. For any ε>0, our algorithm can, with high probability, maintain a (4+ε)-approximate solution under edge insertions and deletions using ~O(n) space and ~O(1) amortized time per update; here, $n$ is the number of nodes in the graph and ~O hides the O(polylog_{1+ε} n) term. The approximation ratio can be improved to (2+ε) with more time. It can be extended to a (2+ε)-approximation sublinear-time algorithm and a distributed-streaming algorithm. Our algorithm is the first streaming algorithm that can maintain the densest subgraph in one pass. Prior to this, no algorithm could do so even in the special case of an incremental stream and even when there is no time restriction. The previously best algorithm in this setting required O(log n) passes [BahmaniKV12]. The space required by our algorithm is tight up to a polylogarithmic factor.
{"title":"Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams","authors":"Sayan Bhattacharya, M. Henzinger, Danupon Nanongkai, Charalampos E. Tsourakakis","doi":"10.1145/2746539.2746592","DOIUrl":"https://doi.org/10.1145/2746539.2746592","url":null,"abstract":"While in many graph mining applications it is crucial to handle a stream of updates efficiently in terms of both time and space, not much was known about achieving such type of algorithm. In this paper we study this issue for a problem which lies at the core of many graph mining applications called densest subgraph problem. We develop an algorithm that achieves time- and space-efficiency for this problem simultaneously. It is one of the first of its kind for graph problems to the best of our knowledge. Given an input graph, the densest subgraph is the subgraph that maximizes the ratio between the number of edges and the number of nodes. For any ε>0, our algorithm can, with high probability, maintain a (4+ε)-approximate solution under edge insertions and deletions using ~O(n) space and ~O(1) amortized time per update; here, $n$ is the number of nodes in the graph and ~O hides the O(polylog_{1+ε} n) term. The approximation ratio can be improved to (2+ε) with more time. It can be extended to a (2+ε)-approximation sublinear-time algorithm and a distributed-streaming algorithm. Our algorithm is the first streaming algorithm that can maintain the densest subgraph in one pass. Prior to this, no algorithm could do so even in the special case of an incremental stream and even when there is no time restriction. The previously best algorithm in this setting required O(log n) passes [BahmaniKV12]. The space required by our algorithm is tight up to a polylogarithmic factor.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73985031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many ma- chine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n ≥ Ω(k2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to de- compose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings.
{"title":"Learning Mixtures of Gaussians in High Dimensions","authors":"Rong Ge, Qingqing Huang, S. Kakade","doi":"10.1145/2746539.2746616","DOIUrl":"https://doi.org/10.1145/2746539.2746616","url":null,"abstract":"Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many ma- chine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n ≥ Ω(k2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to de- compose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74555554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}