Ittai Abraham, D. Durfee, I. Koutis, Sebastian Krinninger, Richard Peng
We initiate the study of fast dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-spectral sparsifier with amortized update time poly(log n, ϵ-1). Second, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-cut sparsifier with worst-case update time poly(log n, ϵ-1). Both sparsifiers have size n · poly(log n, ϵ-1). Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a (1 - ϵ)-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time poly(log n, ϵ-1).
{"title":"On Fully Dynamic Graph Sparsifiers","authors":"Ittai Abraham, D. Durfee, I. Koutis, Sebastian Krinninger, Richard Peng","doi":"10.1109/FOCS.2016.44","DOIUrl":"https://doi.org/10.1109/FOCS.2016.44","url":null,"abstract":"We initiate the study of fast dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-spectral sparsifier with amortized update time poly(log n, ϵ<sup>-1</sup>). Second, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-cut sparsifier with worst-case update time poly(log n, ϵ<sup>-1</sup>). Both sparsifiers have size n · poly(log n, ϵ<sup>-1</sup>). Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a (1 - ϵ)-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time poly(log n, ϵ<sup>-1</sup>).","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player. Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases. Next, we study mechanisms that access the valuations via value queries only. In this setting we establish that the menu complexity - a notion that was already studied in several different contexts - characterizes the number of value queries that the mechanism makes in exactly the same way that the taxation complexity characterizes the communication complexity. Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.
{"title":"Computational Efficiency Requires Simple Taxation","authors":"Shahar Dobzinski","doi":"10.1109/FOCS.2016.30","DOIUrl":"https://doi.org/10.1109/FOCS.2016.30","url":null,"abstract":"We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player. Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in \"pathological\" cases and provide a formal description of these extreme cases. Next, we study mechanisms that access the valuations via value queries only. In this setting we establish that the menu complexity - a notion that was already studied in several different contexts - characterizes the number of value queries that the mechanism makes in exactly the same way that the taxation complexity characterizes the communication complexity. Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"758 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126942572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, M. Thorup
In the turnstile ℓp heavy hitters problem with parameter ε, one must maintain a high-dimensional vector x ∈ ℝn subject to updates of the form update (i,Δ) causing the change xi ← xi + Δ, where i ε[n], Δ ∈ ℝ. Upon receiving a query, the goal is to report every "heavy hitter" i ∈ [n] with |xi| ≥ ε ∥x∥p as part of a list L ⊆ [n] of size O(1/εp), i.e. proportional to the maximum possible number of heavy hitters. For any pε(0,2] the COUNTSKETCH of [CCFC04] solves ℓp heavy hitters using O(ε-p lg n) words of space with O(lg n) update time, O(n lg n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that xi ≥ 0 for all i ∈ [n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the "dyadic trick" for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(ε-1lg2 n), worse update time O(lg2 n), but much better query time O(ε-1poly(lg n)). An extension to all p ∈ (0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(ε-plog n) space, O(log n) update time, and fast O(ε-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the "dyadic trick" by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a "cluster-preserving clustering" algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving clustering may be of broader interest much beyond heavy hitters.
{"title":"Heavy Hitters via Cluster-Preserving Clustering","authors":"Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, M. Thorup","doi":"10.1145/3339185","DOIUrl":"https://doi.org/10.1145/3339185","url":null,"abstract":"In the turnstile ℓ<sub>p</sub> heavy hitters problem with parameter ε, one must maintain a high-dimensional vector x ∈ ℝ<sup>n</sup> subject to updates of the form update (i,Δ) causing the change x<sub>i</sub> ← x<sub>i</sub> + Δ, where i ε[n], Δ ∈ ℝ. Upon receiving a query, the goal is to report every \"heavy hitter\" i ∈ [n] with |x<sub>i</sub>| ≥ ε ∥x∥<sub>p</sub> as part of a list L ⊆ [n] of size O(1/ε<sup>p</sup>), i.e. proportional to the maximum possible number of heavy hitters. For any pε(0,2] the COUNTSKETCH of [CCFC04] solves ℓ<sub>p</sub> heavy hitters using O(ε<sup>-p</sup> lg n) words of space with O(lg n) update time, O(n lg n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that x<sub>i</sub> ≥ 0 for all i ∈ [n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the \"dyadic trick\" for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(ε<sup>-1</sup>lg<sup>2</sup> n), worse update time O(lg<sup>2</sup> n), but much better query time O(ε<sup>-1</sup>poly(lg n)). An extension to all p ∈ (0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(ε-plog n) space, O(log n) update time, and fast O(ε-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the \"dyadic trick\" by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a \"cluster-preserving clustering\" algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving clustering may be of broader interest much beyond heavy hitters.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122877106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charilaos Efthymiou, Thomas P. Hayes, Daniel Stefankovic, Eric Vigoda, Yitong Yin
We study the hard-core (gas) model defined on independent sets of an input graph where the independent sets are weighted by a parameter (aka fugacity) λ > 0. For constant Δ, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree Δ when λ <; λc(Δ). Sly (2010) showed that there is no FPRAS, unless NP=RP, when λ > λc(Δ). The threshold λc(Δ) is the critical point for the statistical physics phase transition for uniqueness/non-uniqueness on the infinite Δ-regular tree. The running time of Weitz's algorithm is exponential in log Δ. Here we present an FPRAS for the partition function whose running time is O* (n2). We analyze the simple single-site Markov chain known as the Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant Δ0 such that for all graphs with maximum degree Δ > Δ0 and girth > 7 (i.e., no cycles of length ≤ 6), the mixing time of the Glauber dynamics is O(nlog n) when λ <; λc(Δ). Our work complements that of Weitz which applies for small constant Δ whereas our work applies for all Δ at least a sufficiently large constant Δ0 (this includes Δ depending on n = IVI). Our proof utilizes loopy BP (belief propagation) which is a widely-used algorithm for inference in graphical models. A novel aspect of our work is using the principal eigenvector for the BP operator to design a distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP. As a byproduct we obtain that the Glauber dynamics, after a short burn-in period, converges close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution when the girth ≥ 6 and λ <; λc(Δ).
{"title":"Convergence of MCMC and Loopy BP in the Tree Uniqueness Region for the Hard-Core Model","authors":"Charilaos Efthymiou, Thomas P. Hayes, Daniel Stefankovic, Eric Vigoda, Yitong Yin","doi":"10.1109/FOCS.2016.80","DOIUrl":"https://doi.org/10.1109/FOCS.2016.80","url":null,"abstract":"We study the hard-core (gas) model defined on independent sets of an input graph where the independent sets are weighted by a parameter (aka fugacity) λ > 0. For constant Δ, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree Δ when λ <; λ<sub>c</sub>(Δ). Sly (2010) showed that there is no FPRAS, unless NP=RP, when λ > λ<sub>c</sub>(Δ). The threshold λ<sub>c</sub>(Δ) is the critical point for the statistical physics phase transition for uniqueness/non-uniqueness on the infinite Δ-regular tree. The running time of Weitz's algorithm is exponential in log Δ. Here we present an FPRAS for the partition function whose running time is O* (n<sup>2</sup>). We analyze the simple single-site Markov chain known as the Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant Δ<sub>0</sub> such that for all graphs with maximum degree Δ > Δ<sub>0</sub> and girth > 7 (i.e., no cycles of length ≤ 6), the mixing time of the Glauber dynamics is O(nlog n) when λ <; λ<sub>c</sub>(Δ). Our work complements that of Weitz which applies for small constant Δ whereas our work applies for all Δ at least a sufficiently large constant Δ<sub>0</sub> (this includes Δ depending on n = IVI). Our proof utilizes loopy BP (belief propagation) which is a widely-used algorithm for inference in graphical models. A novel aspect of our work is using the principal eigenvector for the BP operator to design a distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP. As a byproduct we obtain that the Glauber dynamics, after a short burn-in period, converges close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution when the girth ≥ 6 and λ <; λ<sub>c</sub>(Δ).","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132094233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We give the first polynomial-time approximation schemes (PTASs) for the following problems: (1) uniform facility location in edge-weighted planar graphs, (2) k-median and k-means in edge-weighted planar graphs, (3) k-means in Euclidean space of bounded dimension. Our first and second results extend to minor-closed families of graphs. All our results extend to cost functions that are the pth power of the shortest-path distance. The algorithm is local search where the local neighborhood of a solution S consists of all solutions obtained from S by removing and adding 1/εO(1) centers.
{"title":"Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics","authors":"Vincent Cohen-Addad, P. Klein, Claire Mathieu","doi":"10.1109/FOCS.2016.46","DOIUrl":"https://doi.org/10.1109/FOCS.2016.46","url":null,"abstract":"We give the first polynomial-time approximation schemes (PTASs) for the following problems: (1) uniform facility location in edge-weighted planar graphs, (2) k-median and k-means in edge-weighted planar graphs, (3) k-means in Euclidean space of bounded dimension. Our first and second results extend to minor-closed families of graphs. All our results extend to cost functions that are the pth power of the shortest-path distance. The algorithm is local search where the local neighborhood of a solution S consists of all solutions obtained from S by removing and adding 1/εO(1) centers.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130459371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-MEANS: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie in Euclidean space Rd for some d ≥ 2. k-MEANS and the first algorithms for it were introduced in the 1950's. Over the last six decades, hundreds of papers have studied this problem and different algorithms have been proposed for it. The most commonly used algorithm in practice is known as Lloyd-Forgy, which is also referred to as "the" k-MEANS algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a very simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9 + ε for any fixed ε > 0 for k-Umeans in Euclidean space. Finding an algorithm with a better worst-case approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d. More precisely, for any error parameter ε > 0, the local search algorithm that considers swaps of up to ρ = dO(d) · ε-O(d/ε) centres at a time will produce a solution using exactly k centres whose cost is at most a (1+ε)-factor greater than the optimum solution. Our analysis extends very easily to the more general settings where we want to minimize the sum of q'th powers of the distances between data points and their cluster centres (instead of sum of squares of distances as in k-MEANS) for any fixed q ≥ 1 and where the metric may not be Euclidean but still has fixed doubling dimension.
{"title":"Local Search Yields a PTAS for k-Means in Doubling Metrics","authors":"Zachary Friggstad, M. Rezapour, M. Salavatipour","doi":"10.1109/FOCS.2016.47","DOIUrl":"https://doi.org/10.1109/FOCS.2016.47","url":null,"abstract":"The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-MEANS: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie in Euclidean space Rd for some d ≥ 2. k-MEANS and the first algorithms for it were introduced in the 1950's. Over the last six decades, hundreds of papers have studied this problem and different algorithms have been proposed for it. The most commonly used algorithm in practice is known as Lloyd-Forgy, which is also referred to as \"the\" k-MEANS algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a very simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9 + ε for any fixed ε > 0 for k-Umeans in Euclidean space. Finding an algorithm with a better worst-case approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d. More precisely, for any error parameter ε > 0, the local search algorithm that considers swaps of up to ρ = dO(d) · ε-O(d/ε) centres at a time will produce a solution using exactly k centres whose cost is at most a (1+ε)-factor greater than the optimum solution. Our analysis extends very easily to the more general settings where we want to minimize the sum of q'th powers of the distances between data points and their cluster centres (instead of sum of squares of distances as in k-MEANS) for any fixed q ≥ 1 and where the metric may not be Euclidean but still has fixed doubling dimension.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117043868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An ε approximate quantile sketch receives a stream of items x1,⋯,xn and allows one to approximate the rank of any query item up to additive error ε n with probability at least 1-δ.The rank of a query x is the number of stream items such that xi ≤ x. The minimal sketch size required for this task is trivially at least 1/ε.Felber and Ostrovsky obtain a O((1/ε)log(1/ε)) space sketch for a fixed δ.Without restrictions on the nature of the stream or the ratio between ε and n, no better upper or lower bounds were known to date. This paper obtains an O((1/ε)log log (1/δ)) space sketch and a matching lower bound. This resolves the open problem and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log(1/ε)) lower bound is known. One of our contributions is a novel representation and modification of the widely used merge-and-reduce construction. This modification allows for an analysis which is both tight and extremely simple. The same technique was reported, in private communications, to be useful for improving other sketching objectives and geometric coreset constructions.
{"title":"Optimal Quantile Approximation in Streams","authors":"Zohar S. Karnin, Kevin J. Lang, Edo Liberty","doi":"10.1109/FOCS.2016.17","DOIUrl":"https://doi.org/10.1109/FOCS.2016.17","url":null,"abstract":"This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An ε approximate quantile sketch receives a stream of items x1,⋯,xn and allows one to approximate the rank of any query item up to additive error ε n with probability at least 1-δ.The rank of a query x is the number of stream items such that xi ≤ x. The minimal sketch size required for this task is trivially at least 1/ε.Felber and Ostrovsky obtain a O((1/ε)log(1/ε)) space sketch for a fixed δ.Without restrictions on the nature of the stream or the ratio between ε and n, no better upper or lower bounds were known to date. This paper obtains an O((1/ε)log log (1/δ)) space sketch and a matching lower bound. This resolves the open problem and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log(1/ε)) lower bound is known. One of our contributions is a novel representation and modification of the widely used merge-and-reduce construction. This modification allows for an analysis which is both tight and extremely simple. The same technique was reported, in private communications, to be useful for improving other sketching objectives and geometric coreset constructions.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"373 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We make progress in the following three problems: 1. Constructing optimal seeded non-malleable extractors, 2. Constructing optimal privacy amplification protocols with an active adversary, for any possible security parameter, 3. Constructing extractors for independent weak random sources, when the min-entropy is extremely small (i.e., near logarithmic). For the first two problems, the best known non-malleable extractors by Chattopadhyay, Goyal and Li, and by Cohen all require seed length and min-entropy with quadratic loss in parameters. As a result, the best known explicit privacy amplification protocols with an active adversary, which achieve two rounds of communication and optimal entropy loss was sub-optimal in the min-entropy of the source. In this paper we give an explicit non-malleable extractor that works for nearly optimal seed length and min-entropy, and yields a two-round privacy amplification protocol with optimal entropy loss for almost all ranges of the security parameter. For the third problem, we improve upon a very recent result by Cohen and Schulman and give an explicit extractor that uses an absolute constant number of sources, each with almost logarithmic min-entropy. The key ingredient in all our constructions is a generalized, and much more efficient version of the independence preserving merger introduced by Cohen, which we call non-malleable independence preserving merger. Our construction of the merger also simplifies that of Cohen and Schulman, and may be of independent interest.
{"title":"Explicit Non-malleable Extractors, Multi-source Extractors, and Almost Optimal Privacy Amplification Protocols","authors":"Eshan Chattopadhyay, Xin Li","doi":"10.1109/FOCS.2016.25","DOIUrl":"https://doi.org/10.1109/FOCS.2016.25","url":null,"abstract":"We make progress in the following three problems: 1. Constructing optimal seeded non-malleable extractors, 2. Constructing optimal privacy amplification protocols with an active adversary, for any possible security parameter, 3. Constructing extractors for independent weak random sources, when the min-entropy is extremely small (i.e., near logarithmic). For the first two problems, the best known non-malleable extractors by Chattopadhyay, Goyal and Li, and by Cohen all require seed length and min-entropy with quadratic loss in parameters. As a result, the best known explicit privacy amplification protocols with an active adversary, which achieve two rounds of communication and optimal entropy loss was sub-optimal in the min-entropy of the source. In this paper we give an explicit non-malleable extractor that works for nearly optimal seed length and min-entropy, and yields a two-round privacy amplification protocol with optimal entropy loss for almost all ranges of the security parameter. For the third problem, we improve upon a very recent result by Cohen and Schulman and give an explicit extractor that uses an absolute constant number of sources, each with almost logarithmic min-entropy. The key ingredient in all our constructions is a generalized, and much more efficient version of the independence preserving merger introduced by Cohen, which we call non-malleable independence preserving merger. Our construction of the merger also simplifies that of Cohen and Schulman, and may be of independent interest.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. We prove that these exponential gaps are necessary and establish numerous connections between the deterministic and randomized complexities in the LOCAL model. Each of our results has a very compelling take-away message: 1) Building on the recent randomized lower bounds of Brandt et al. [1], we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ is O(log Δ log n + log*n), for any Δ > = 55, whereas its deterministic complexity is Ω(log Δ n) for any Δ > = 3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees. 2) We prove that any deterministic algorithm for a natural class of problems that runs in O(1) + o(log Δ n) rounds can be transformed to run in O(log*n - log*Δ + 1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log Δ n) time deterministically. This gives an alternate proof that deterministically Δ-coloring a tree with small Δ takes Ω(log Δ n) rounds. 3) We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √log n. This shows that a deterministic Ω(log Δ n) lower bound for any problem (Δ-coloring a tree, for example) implies a randomized Ω(log Δ log n) lower bound. It also illustrates that the graph shattering technique employed in recent randomized symmetry breaking algorithms is absolutely essential to the LOCAL model. For example, it is provably impossible to improve the 2O(√log log n) term in the complexities of the best MIS and (Δ+1)-coloring algorithms without also improving the 2O(√log n)-round Panconesi-Srinivasan algorithm.
{"title":"An Exponential Separation between Randomized and Deterministic Complexity in the LOCAL Model","authors":"Yi-Jun Chang, T. Kopelowitz, S. Pettie","doi":"10.1109/FOCS.2016.72","DOIUrl":"https://doi.org/10.1109/FOCS.2016.72","url":null,"abstract":"Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. We prove that these exponential gaps are necessary and establish numerous connections between the deterministic and randomized complexities in the LOCAL model. Each of our results has a very compelling take-away message: 1) Building on the recent randomized lower bounds of Brandt et al. [1], we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ is O(log Δ log n + log*n), for any Δ > = 55, whereas its deterministic complexity is Ω(log Δ n) for any Δ > = 3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees. 2) We prove that any deterministic algorithm for a natural class of problems that runs in O(1) + o(log Δ n) rounds can be transformed to run in O(log*n - log*Δ + 1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log Δ n) time deterministically. This gives an alternate proof that deterministically Δ-coloring a tree with small Δ takes Ω(log Δ n) rounds. 3) We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √log n. This shows that a deterministic Ω(log Δ n) lower bound for any problem (Δ-coloring a tree, for example) implies a randomized Ω(log Δ log n) lower bound. It also illustrates that the graph shattering technique employed in recent randomized symmetry breaking algorithms is absolutely essential to the LOCAL model. For example, it is provably impossible to improve the 2O(√log log n) term in the complexities of the best MIS and (Δ+1)-coloring algorithms without also improving the 2O(√log n)-round Panconesi-Srinivasan algorithm.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130956557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the noisy population recovery problem of Dvir et al. [6], the goal is to learn an unknown distribution f on binary strings of length n from noisy samples. A noisy sample with parameter μ ∈ [0,1] is generated by selecting a sample from f, and independently flipping each coordinate of the sample with probability (1-μ)/2. We assume an upper bound k on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error ε. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We describe an algorithm that for each μ > 0, provides the desired estimate of the distribution in time bounded by a polynomial in k, n and 1/ε improving upon the previous best result of poly(klog log k, n, 1/ε) due to Lovett and Zhang [9]. Our proof combines ideas from [9] with a noise attenuated version of Möbius inversion. The latter crucially uses the robust local inverse construction of Moitra and Saks [11].
在Dvir等人[6]的有噪声种群恢复问题中,目标是从有噪声的样本中学习长度为n的二进制字符串上的未知分布。从f中选取一个样本,并以(1-μ)/2的概率独立翻转样本的每个坐标,生成参数μ∈[0,1]的噪声样本。我们假设支持分布的大小有一个上限k,目标是估计任何字符串在给定误差ε内的概率。已知该问题的算法复杂度和样本复杂度是多项式相关的。我们描述了一种算法,该算法在先前由Lovett和Zhang[9]得出的poly(klog log k, n, 1/ε)的最佳结果的基础上,对每个μ > 0提供了以k, n和1/ε的多项式为界的时间分布的期望估计。我们的证明结合了[9]的思想和Möbius反演的噪声衰减版本。后者关键地使用了Moitra和Saks的鲁棒局部逆构造[11]。
{"title":"Noisy Population Recovery in Polynomial Time","authors":"Anindya De, M. Saks, Sijian Tang","doi":"10.1109/FOCS.2016.77","DOIUrl":"https://doi.org/10.1109/FOCS.2016.77","url":null,"abstract":"In the noisy population recovery problem of Dvir et al. [6], the goal is to learn an unknown distribution f on binary strings of length n from noisy samples. A noisy sample with parameter μ ∈ [0,1] is generated by selecting a sample from f, and independently flipping each coordinate of the sample with probability (1-μ)/2. We assume an upper bound k on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error ε. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We describe an algorithm that for each μ > 0, provides the desired estimate of the distribution in time bounded by a polynomial in k, n and 1/ε improving upon the previous best result of poly(klog log k, n, 1/ε) due to Lovett and Zhang [9]. Our proof combines ideas from [9] with a noise attenuated version of Möbius inversion. The latter crucially uses the robust local inverse construction of Moitra and Saks [11].","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116000848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}