Alexandr Andoni, Robert Krauthgamer, Krzysztof Onak
We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $eps>0$, the algorithm computes a $(log n)^{O(1/eps)}$ approximation in $n^{1+eps}$ time. This is an {em exponential} improvement over the previously known approximation factor, $2^{tilde O(sqrt{log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.
我们提出了一种近线性时间算法,该算法在多对数因子内近似两个字符串之间的编辑距离。对于长度为$n$的字符串和每个固定的$eps>0$,算法在$n^{1+eps}$时间内计算一个$(log n)^{O(1/eps)}$近似值。与之前已知的近似因子{em}$2^{tilde O(sqrt{log n})}$相比,这是一个级的改进,并且运行时间相当[Ostrovsky and Rabani, J. ACM 2007; Andoni and Onak, STOC 2009]。这个结果在研究新的emph{非对称查询}模型时自然出现。在这个模型中,输入由两个字符串$x$和$y$组成,算法可以不受限制地访问$y$,同时对查询$x$的每个符号收费。实际上,我们通过设计一个算法来获得我们的主要结果,该算法在该模型中进行少量查询。然后,我们提供查询数量的一个几乎匹配的下界。我们的下界首次揭示了编辑距离的硬度,因为输入字符串是“重复的”,这意味着它们的许多子字符串几乎相同。因此,我们的下界提供了编辑距离和Ulam距离之间的第一个严格的分离。
{"title":"Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity","authors":"Alexandr Andoni, Robert Krauthgamer, Krzysztof Onak","doi":"10.1109/FOCS.2010.43","DOIUrl":"https://doi.org/10.1109/FOCS.2010.43","url":null,"abstract":"We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $eps>0$, the algorithm computes a $(log n)^{O(1/eps)}$ approximation in $n^{1+eps}$ time. This is an {em exponential} improvement over the previously known approximation factor, $2^{tilde O(sqrt{log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115722193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion $Phi$ then we show that any deterministic $t$-probe data structure for $n$ points must use space $S$ where $(St/n)^t > Phi$. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as $l_1$, $l_2$, $l_infty$, and some new ones, by simply computing their expansion. In the process, we strengthen and generalize our previous results~cite{PTW08}. Additionally, we unify the approach in~cite{PTW08} and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on $t$ is weak, that is, the bound drops exponentially in $t$. We show a much stronger (tight) time-space tradeoff for the class of emph{dynamic} emph{low contention} data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often. A full version of the paper could be found in [19].
{"title":"Lower Bounds on Near Neighbor Search via Metric Expansion","authors":"R. Panigrahy, Kunal Talwar, Udi Wieder","doi":"10.1109/FOCS.2010.82","DOIUrl":"https://doi.org/10.1109/FOCS.2010.82","url":null,"abstract":"In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion $Phi$ then we show that any deterministic $t$-probe data structure for $n$ points must use space $S$ where $(St/n)^t > Phi$. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as $l_1$, $l_2$, $l_infty$, and some new ones, by simply computing their expansion. In the process, we strengthen and generalize our previous results~cite{PTW08}. Additionally, we unify the approach in~cite{PTW08} and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on $t$ is weak, that is, the bound drops exponentially in $t$. We show a much stronger (tight) time-space tradeoff for the class of emph{dynamic} emph{low contention} data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often. A full version of the paper could be found in [19].","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124595660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The question of polynomial learn ability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learn ability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learn ability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. Specifically, we show that parameters of a Gaussian mixture distribution with fixed number of components can be learned using a sample whose size is polynomial in dimension and all other parameters. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call “polynomial families” in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.
{"title":"Polynomial Learning of Distribution Families","authors":"M. Belkin, Kaushik Sinha","doi":"10.1137/13090818X","DOIUrl":"https://doi.org/10.1137/13090818X","url":null,"abstract":"The question of polynomial learn ability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learn ability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learn ability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. Specifically, we show that parameters of a Gaussian mixture distribution with fixed number of components can be learned using a sample whose size is polynomial in dimension and all other parameters. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call “polynomial families” in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127138423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has running time and data requirements polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As a simple consequence of our learning algorithm, we we give the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture. It was open whether proper density estimation was even statistically possible (with no assumptions) given only polynomially many samples, let alone whether it could be computationally efficient. The building blocks of our algorithm are based on the work (Kalai emph{et al}, STOC 2010) that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in the previous work is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage a weaker algorithm for learning univariate mixtures (of many Gaussians) to learn in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with methods for backtracking and recovering from the failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.
{"title":"Settling the Polynomial Learnability of Mixtures of Gaussians","authors":"Ankur Moitra, G. Valiant","doi":"10.1109/FOCS.2010.15","DOIUrl":"https://doi.org/10.1109/FOCS.2010.15","url":null,"abstract":"Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has running time and data requirements polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As a simple consequence of our learning algorithm, we we give the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture. It was open whether proper density estimation was even statistically possible (with no assumptions) given only polynomially many samples, let alone whether it could be computationally efficient. The building blocks of our algorithm are based on the work (Kalai emph{et al}, STOC 2010) that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in the previous work is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage a weaker algorithm for learning univariate mixtures (of many Gaussians) to learn in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with methods for backtracking and recovering from the failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129517446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we consider coding schemes for computationally bounded channels, which can introduce an arbitrary set of errors as long as (a) the fraction of errors is bounded with high probability by a parameter p and (b) the process which adds the errors can be described by a sufficiently "simple" circuit. Codes for such channel models are attractive since, like codes for standard adversarial errors, they can handle channels whose true behavior is unknown or varying over time. For three classes of channels, we provide explicit, efficiently encodable/decodable codes of optimal rate where only inefficiently decodable codes were previously known. In each case, we provide one encoder/decoder that works for every channel in the class. Unique decoding for additive errors: We give the first construction of a poly-time encodable/decodable code for additive (a.k.a. oblivious) channels that achieve the Shannon capacity 1-H(p). List-decoding for online log-space channels: A space-S(N) bounded channel reads and modifies the transmitted codeword as a stream, using at most S(N) bits of workspace on transmissions of N bits. For constant S, this captures many models from the literature, including "discrete channels with finite memory" and "arbitrarily varying channels". We give an efficient code with optimal rate (arbitrarily close to 1-H(p)) that recovers a short list containing the correct message with high probability for channels which read and modify the transmitted codeword as a stream, using at most O(log N) bits of workspace on transmissions of N bits. List-decoding for poly-time channels: For any constant c we give a similar list-decoding result for channels describable by circuits of size at most N^c, assuming the existence of pseudorandom generators.
{"title":"Codes for Computationally Simple Channels: Explicit Constructions with Optimal Rate","authors":"V. Guruswami, Adam D. Smith","doi":"10.1109/FOCS.2010.74","DOIUrl":"https://doi.org/10.1109/FOCS.2010.74","url":null,"abstract":"In this paper, we consider coding schemes for computationally bounded channels, which can introduce an arbitrary set of errors as long as (a) the fraction of errors is bounded with high probability by a parameter p and (b) the process which adds the errors can be described by a sufficiently \"simple\" circuit. Codes for such channel models are attractive since, like codes for standard adversarial errors, they can handle channels whose true behavior is unknown or varying over time. For three classes of channels, we provide explicit, efficiently encodable/decodable codes of optimal rate where only inefficiently decodable codes were previously known. In each case, we provide one encoder/decoder that works for every channel in the class. Unique decoding for additive errors: We give the first construction of a poly-time encodable/decodable code for additive (a.k.a. oblivious) channels that achieve the Shannon capacity 1-H(p). List-decoding for online log-space channels: A space-S(N) bounded channel reads and modifies the transmitted codeword as a stream, using at most S(N) bits of workspace on transmissions of N bits. For constant S, this captures many models from the literature, including \"discrete channels with finite memory\" and \"arbitrarily varying channels\". We give an efficient code with optimal rate (arbitrarily close to 1-H(p)) that recovers a short list containing the correct message with high probability for channels which read and modify the transmitted codeword as a stream, using at most O(log N) bits of workspace on transmissions of N bits. List-decoding for poly-time channels: For any constant c we give a similar list-decoding result for channels describable by circuits of size at most N^c, assuming the existence of pseudorandom generators.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132877267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). %In other words, we can efficiently sample $k$-subsets of $[m]$ with probabilities proportional to the corresponding $k$ by $k$ principal minors of any given $m$ by $m$ positive semi definite matrix. This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala (see Section $7.4$ of cite{KV}, also implicit in cite{BDM, DRVW}). Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^omega log n)$ arithmetic operations (where $omega$ is the exponent of matrix multiplication) and a second variant of it for $(1+eps)$-approximate volume sampling runs in $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply the following results for low-rank matrix approximation: (1) Given $A in reals^{m times n}$, in $O(kmn^{omega} log n)$ arithmetic operations we can find $k$ of its rows such that projecting onto their span gives a $sqrt{k+1}$-approximation to the matrix of rank $k$ closest to $A$ under the Frobenius norm. This improves the $O(k sqrt{log k})$-approximation of Boutsidis, Drineas and Mahoney cite{BDM} and matches the lower bound shown in cite{DRVW}. The method of conditional expectations gives a emph{deterministic} algorithm with the same complexity. The running time can be improved to $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ at the cost of losing an extra $(1+eps)$ in the approximation factor. (2) The same rows and projection as in the previous point give a $sqrt{(k+1)(n-k)}$-approximation to the matrix of rank $k$ closest to $A$ under the spectral norm. In this paper, we show an almost matching lower bound of $sqrt{n}$, even for $k=1$.
我们给出了有效的体积采样算法,即选择任意给定矩阵的$k$ -行子集,其概率与由它们定义的简单体和原点的平方体积成正比(或由这些行子集定义的平行六面体的平方体积)。 %In other words, we can efficiently sample $k$-subsets of $[m]$ with probabilities proportional to the corresponding $k$ by $k$ principal minors of any given $m$ by $m$ positive semi definite matrix. This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala (see Section $7.4$ of cite{KV}, also implicit in cite{BDM, DRVW}). Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^omega log n)$ arithmetic operations (where $omega$ is the exponent of matrix multiplication) and a second variant of it for $(1+eps)$-approximate volume sampling runs in $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply the following results for low-rank matrix approximation: (1) Given $A in reals^{m times n}$, in $O(kmn^{omega} log n)$ arithmetic operations we can find $k$ of its rows such that projecting onto their span gives a $sqrt{k+1}$-approximation to the matrix of rank $k$ closest to $A$ under the Frobenius norm. This improves the $O(k sqrt{log k})$-approximation of Boutsidis, Drineas and Mahoney cite{BDM} and matches the lower bound shown in cite{DRVW}. The method of conditional expectations gives a emph{deterministic} algorithm with the same complexity. The running time can be improved to $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ at the cost of losing an extra $(1+eps)$ in the approximation factor. (2) The same rows and projection as in the previous point give a $sqrt{(k+1)(n-k)}$-approximation to the matrix of rank $k$ closest to $A$ under the spectral norm. In this paper, we show an almost matching lower bound of $sqrt{n}$, even for $k=1$.
{"title":"Efficient Volume Sampling for Row/Column Subset Selection","authors":"A. Deshpande, Luis Rademacher","doi":"10.1109/FOCS.2010.38","DOIUrl":"https://doi.org/10.1109/FOCS.2010.38","url":null,"abstract":"We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). %In other words, we can efficiently sample $k$-subsets of $[m]$ with probabilities proportional to the corresponding $k$ by $k$ principal minors of any given $m$ by $m$ positive semi definite matrix. This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala (see Section $7.4$ of cite{KV}, also implicit in cite{BDM, DRVW}). Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^omega log n)$ arithmetic operations (where $omega$ is the exponent of matrix multiplication) and a second variant of it for $(1+eps)$-approximate volume sampling runs in $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply the following results for low-rank matrix approximation: (1) Given $A in reals^{m times n}$, in $O(kmn^{omega} log n)$ arithmetic operations we can find $k$ of its rows such that projecting onto their span gives a $sqrt{k+1}$-approximation to the matrix of rank $k$ closest to $A$ under the Frobenius norm. This improves the $O(k sqrt{log k})$-approximation of Boutsidis, Drineas and Mahoney cite{BDM} and matches the lower bound shown in cite{DRVW}. The method of conditional expectations gives a emph{deterministic} algorithm with the same complexity. The running time can be improved to $O(mn log m cdot k^{2}/eps^{2} + m log^{omega} m cdot k^{2omega+1}/eps^{2omega} cdot log(k eps^{-1} log m))$ at the cost of losing an extra $(1+eps)$ in the approximation factor. (2) The same rows and projection as in the previous point give a $sqrt{(k+1)(n-k)}$-approximation to the matrix of rank $k$ closest to $A$ under the spectral norm. In this paper, we show an almost matching lower bound of $sqrt{n}$, even for $k=1$.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122498802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, A. Mcgregor
This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the ``rectangle property'' due to the inherent input sharing. Second, we use these bounds to resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] on the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard multi-pass model and the multi-pass model that permits reverse passes. Third, we present the first passive memory checkers that verify the interaction transcripts of priority queues, stacks, and double-ended queues. We obtain tight upper and lower bounds for these problems, thereby addressing an important sub-class of the memory checking framework of Blum et al. [Algorithmica, 1994].
{"title":"Information Cost Tradeoffs for Augmented Index and Streaming Language Recognition","authors":"Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, A. Mcgregor","doi":"10.1109/FOCS.2010.44","DOIUrl":"https://doi.org/10.1109/FOCS.2010.44","url":null,"abstract":"This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the ``rectangle property'' due to the inherent input sharing. Second, we use these bounds to resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] on the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard multi-pass model and the multi-pass model that permits reverse passes. Third, we present the first passive memory checkers that verify the interaction transcripts of priority queues, stacks, and double-ended queues. We obtain tight upper and lower bounds for these problems, thereby addressing an important sub-class of the memory checking framework of Blum et al. [Algorithmica, 1994].","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114240139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-04-13DOI: 10.4086/toc.2012.v008a015
Ashish Goel, Ian Post
We study the single-sink buy-at-bulk problem with an unknown cost function. We wish to route flow from a set of demand nodes to a root node, where the cost of routing x total flow along an edge is proportional to f(x) for some concave, non-decreasing function f satisfying f(0)=0. We present a simple, fast, combinatorial algorithm that takes a set of demands and constructs a single tree T such that for all f the cost f(T) is a 47.45-approximation of the optimal cost for that f. This is within a factor of 2.33 of the best approximation ratio currently achievable when the tree can be optimized for a specific function. Trees achieving simultaneous O(1)-approximations for all concave functions were previously not known to exist regardless of computation time.
{"title":"One Tree Suffices: A Simultaneous O(1)-Approximation for Single-Sink Buy-at-Bulk","authors":"Ashish Goel, Ian Post","doi":"10.4086/toc.2012.v008a015","DOIUrl":"https://doi.org/10.4086/toc.2012.v008a015","url":null,"abstract":"We study the single-sink buy-at-bulk problem with an unknown cost function. We wish to route flow from a set of demand nodes to a root node, where the cost of routing x total flow along an edge is proportional to f(x) for some concave, non-decreasing function f satisfying f(0)=0. We present a simple, fast, combinatorial algorithm that takes a set of demands and constructs a single tree T such that for all f the cost f(T) is a 47.45-approximation of the optimal cost for that f. This is within a factor of 2.33 of the best approximation ratio currently achievable when the tree can be optimized for a specific function. Trees achieving simultaneous O(1)-approximations for all concave functions were previously not known to exist regardless of computation time.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114238332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been much progress on efficient algorithms for clustering data points generated by a mixture of k probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least Omega(k) standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a "proximity condition'': the projection of any data point onto the line joining its cluster center to any other cluster center is Omega(k) standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent the difference between a point and the mean of the cluster to which it belongs. We show that in the generative models studied, our proximity condition is satisfied and so we are able to derive most known results for generative models as corollaries of our main result. We also prove some new results for generative models - e.g., we can cluster all but a small fraction of points only assuming a bound on the variance. Our algorithm relies on the well known k-means algorithm, and along the way, we prove a result of independent interest – that the k-means algorithm converges to the "true centers'' even in the presence of spurious points provided the initial (estimated) centers are close enough to the corresponding actual centers and all but a small fraction of the points satisfy the proximity condition. Finally, we present a new technique for boosting the ratio of inter-center separation to standard deviation. This allows us to prove results for learning certain mixture of distributions under weaker separation conditions.
{"title":"Clustering with Spectral Norm and the k-Means Algorithm","authors":"Amit Kumar, R. Kannan","doi":"10.1109/FOCS.2010.35","DOIUrl":"https://doi.org/10.1109/FOCS.2010.35","url":null,"abstract":"There has been much progress on efficient algorithms for clustering data points generated by a mixture of k probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least Omega(k) standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a \"proximity condition'': the projection of any data point onto the line joining its cluster center to any other cluster center is Omega(k) standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent the difference between a point and the mean of the cluster to which it belongs. We show that in the generative models studied, our proximity condition is satisfied and so we are able to derive most known results for generative models as corollaries of our main result. We also prove some new results for generative models - e.g., we can cluster all but a small fraction of points only assuming a bound on the variance. Our algorithm relies on the well known k-means algorithm, and along the way, we prove a result of independent interest – that the k-means algorithm converges to the \"true centers'' even in the presence of spurious points provided the initial (estimated) centers are close enough to the corresponding actual centers and all but a small fraction of the points satisfy the proximity condition. Finally, we present a new technique for boosting the ratio of inter-center separation to standard deviation. This allows us to prove results for learning certain mixture of distributions under weaker separation conditions.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131593020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has been shown by Indyk and Sidiropoulos cite{indyk_genus} that any graph of genus $g>0$ can be stochastically embedded into a distribution over planar graphs with distortion $2^{O(g)}$. This bound was later improved to $O(g^2)$ by Borradaile, Lee and Sidiropoulos cite{BLS09}. We give an embedding with distortion $O(log g)$, which is asymptotically optimal. Apart from the improved distortion, another advantage of our embedding is that it can be computed in polynomial time. In contrast, the algorithm of cite{BLS09} requires solving an NP-hard problem. Our result implies in particular a reduction for a large class of geometric optimization problems from instances on genus-$g$ graphs, to corresponding ones on planar graphs, with a $O(log g)$ loss factor in the approximation guarantee.
{"title":"Optimal Stochastic Planarization","authors":"Anastasios Sidiropoulos","doi":"10.1109/FOCS.2010.23","DOIUrl":"https://doi.org/10.1109/FOCS.2010.23","url":null,"abstract":"It has been shown by Indyk and Sidiropoulos cite{indyk_genus} that any graph of genus $g>0$ can be stochastically embedded into a distribution over planar graphs with distortion $2^{O(g)}$. This bound was later improved to $O(g^2)$ by Borradaile, Lee and Sidiropoulos cite{BLS09}. We give an embedding with distortion $O(log g)$, which is asymptotically optimal. Apart from the improved distortion, another advantage of our embedding is that it can be computed in polynomial time. In contrast, the algorithm of cite{BLS09} requires solving an NP-hard problem. Our result implies in particular a reduction for a large class of geometric optimization problems from instances on genus-$g$ graphs, to corresponding ones on planar graphs, with a $O(log g)$ loss factor in the approximation guarantee.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115004082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}