2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)最新文献_第7页

On Fully Dynamic Graph Sparsifiers 关于全动态图稀疏器

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-04-07 DOI: 10.1109/FOCS.2016.44

Ittai Abraham, D. Durfee, I. Koutis, Sebastian Krinninger, Richard Peng

We initiate the study of fast dynamic algorithms for graph sparsification problems and obtain fully dynamic algorithms, allowing both edge insertions and edge deletions, that take polylogarithmic time after each update in the graph. Our three main results are as follows. First, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-spectral sparsifier with amortized update time poly(log n, ϵ^-1). Second, we give a fully dynamic algorithm for maintaining a (1 ± ϵ)-cut sparsifier with worst-case update time poly(log n, ϵ^-1). Both sparsifiers have size n · poly(log n, ϵ^-1). Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining a (1 - ϵ)-approximation to the value of the maximum flow in an unweighted, undirected, bipartite graph with amortized update time poly(log n, ϵ^-1).

我们开始研究图稀疏化问题的快速动态算法，并获得了完全动态的算法，允许边缘插入和边缘删除，在图的每次更新后花费多对数时间。我们的三个主要结果如下。首先，我们给出了一个完全动态的算法来维持一个(1±λ)-频谱稀疏器，它具有平摊更新时间多边形(log n， ϵ-1)。其次，我们给出了一个完全动态的算法，用于维护一个(1±λ)切稀疏器，具有最坏更新时间poly(log n， ϵ-1)。两种稀疏剂的大小都是n·poly(log n， ϵ-1)。第三，我们应用我们的动态稀疏器算法来获得一个完全动态的算法，用于在一个具有平摊更新时间多边形(log n， ϵ-1)的无加权、无向、二部图中维持最大流量值的(1 - λ)-近似。

引用次数: 68

Computational Efficiency Requires Simple Taxation 计算效率需要简单的税收

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-04-07 DOI: 10.1109/FOCS.2016.30

Shahar Dobzinski

We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player. Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in "pathological" cases and provide a formal description of these extreme cases. Next, we study mechanisms that access the valuations via value queries only. In this setting we establish that the menu complexity - a notion that was already studied in several different contexts - characterizes the number of value queries that the mechanism makes in exactly the same way that the taxation complexity characterizes the communication complexity. Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.

我们描述了真实机制的通信复杂性。我们的出发点是众所周知的税收原则。税收原则主张，每一种真实的机制都可以这样解释:每个参与者面前都有一个菜单，其中包含每一捆商品的价格(价格仅取决于其他参与者的估值)。根据这个菜单，每个玩家都被分配了一个利润最大化的捆绑包。我们将真实机制的征税复杂性定义为可能呈现给玩家的菜单的最大数量的对数。我们的主要发现是，一般来说，税收复杂性本质上等于沟通复杂性。证明包括两个主要步骤。首先，我们证明了在足够丰富的领域中，税收复杂度不超过通信复杂度。然后，我们表明，只有在“病态”情况下，税收复杂性远小于通信复杂性，并提供了这些极端情况的正式描述。接下来，我们将研究仅通过值查询访问赋值的机制。在这种情况下，我们确定菜单复杂性——这个概念已经在几个不同的上下文中进行了研究——表征了机制所进行的价值查询的数量，就像税收复杂性表征了通信复杂性一样。我们的方法产生了几个应用，包括通过低通信开销、快速计算价格和通过计算高效的真实机制的近似硬度来加强解决方案概念。

{"title":"Computational Efficiency Requires Simple Taxation","authors":"Shahar Dobzinski","doi":"10.1109/FOCS.2016.30","DOIUrl":"https://doi.org/10.1109/FOCS.2016.30","url":null,"abstract":"We characterize the communication complexity of truthful mechanisms. Our departure point is the well known taxation principle. The taxation principle asserts that every truthful mechanism can be interpreted as follows: every player is presented with a menu that consists of a price for each bundle (the prices depend only on the valuations of the other players). Each player is allocated a bundle that maximizes his profit according to this menu. We define the taxation complexity of a truthful mechanism to be the logarithm of the maximum number of menus that may be presented to a player. Our main finding is that in general the taxation complexity essentially equals the communication complexity. The proof consists of two main steps. First, we prove that for rich enough domains the taxation complexity is at most the communication complexity. We then show that the taxation complexity is much smaller than the communication complexity only in \"pathological\" cases and provide a formal description of these extreme cases. Next, we study mechanisms that access the valuations via value queries only. In this setting we establish that the menu complexity - a notion that was already studied in several different contexts - characterizes the number of value queries that the mechanism makes in exactly the same way that the taxation complexity characterizes the communication complexity. Our approach yields several applications, including strengthening the solution concept with low communication overhead, fast computation of prices, and hardness of approximation by computationally efficient truthful mechanisms.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"758 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126942572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Heavy Hitters via Cluster-Preserving Clustering 通过保持集群的集群

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-04-05 DOI: 10.1145/3339185

Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, M. Thorup

In the turnstile ℓ_p heavy hitters problem with parameter ε, one must maintain a high-dimensional vector x ∈ ℝⁿ subject to updates of the form update (i,Δ) causing the change x_i ← x_i + Δ, where i ε[n], Δ ∈ ℝ. Upon receiving a query, the goal is to report every "heavy hitter" i ∈ [n] with |x_i| ≥ ε ∥x∥_p as part of a list L ⊆ [n] of size O(1/ε^p), i.e. proportional to the maximum possible number of heavy hitters. For any pε(0,2] the COUNTSKETCH of [CCFC04] solves ℓ_p heavy hitters using O(ε^-p lg n) words of space with O(lg n) update time, O(n lg n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that x_i ≥ 0 for all i ∈ [n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the "dyadic trick" for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(ε^-1lg² n), worse update time O(lg² n), but much better query time O(ε^-1poly(lg n)). An extension to all p ∈ (0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(ε-plog n) space, O(log n) update time, and fast O(ε-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the "dyadic trick" by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a "cluster-preserving clustering" algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving clustering may be of broader interest much beyond heavy hitters.

在具有参数ε的转门p重头问题中，必须保持一个高维向量x∈∈∈n，其更新形式为update (i，Δ)，引起xi←xi + Δ的变化，其中i ε[n]， Δ∈∈∈。在收到查询时，目标是将每个“重拳手”i∈[n]，且|xi|≥ε∥x∥p作为大小为O(1/εp)的列表L∈[n]的一部分，即与最大可能的重拳手数量成正比。对于任意pε(0,2)， [CCFC04]的countssketch使用O(ε-p lg n)个空间字，以O(lg n)个更新时间，O(n lg n)个查询时间来输出L，并且任意查询后的输出具有高概率(whp) 1 - 1/poly(n) [JST11, Section 4.4]。即使在严格的turnstile模型[JST11]中，这个空间界也是最优的，在严格的turnstile模型中，它承诺在流的所有点上，对于所有i∈[n]， xi≥0，但不幸的是，查询时间非常慢。为了解决这个问题，[CM05]提出了严格转门模型中p = 1的COUNTMIN草图的“二元技巧”，为了保持whp的准确性，实现了次优空间O(ε-1lg2 n)，较差的更新时间O(lg2 n)，但较好的查询时间O(ε-1poly(lgn))。对所有p∈(0,2)的扩展出现在[KNPW11，定理1]中，可以从[Pag13]中得到。我们表明，空间和更新时间与查询时间之间的这种权衡是不必要的。本文提出了一种新的EXPANDERSKETCH算法，该算法在最一般的转门模型中实现了最优的O(ε-plog n)空间、O(log n)更新时间和O(ε-plog (log n))查询时间，提供了正确性whp。事实上，在严格的旋转门模型中，我们的p = 1算法的一个更简单的版本回答查询的速度甚至比“二元技巧”要快，大约高出log n个因子，在所有方面都占主导地位。我们的主要创新是将重磅数据有效地简化为一个聚类问题，其中每个重磅数据在一个更大的图中被编码为某种形式的噪声光谱聚类，目标是识别每个聚类。因为必须找到每一个重量级角色，所以正确性要求找到每一个集群。因此，我们需要一种“保簇聚类”算法，该算法将图划分为簇，并保证不破坏任何原始簇。为了做到这一点，我们首先应用标准谱图划分，然后我们使用一些新的组合技术来修改得到的切割，以确保原始聚类得到充分的保留。我们的保持集群的集群可能会引起更广泛的兴趣，远远超出重量级人物。

{"title":"Heavy Hitters via Cluster-Preserving Clustering","authors":"Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, M. Thorup","doi":"10.1145/3339185","DOIUrl":"https://doi.org/10.1145/3339185","url":null,"abstract":"In the turnstile ℓp heavy hitters problem with parameter ε, one must maintain a high-dimensional vector x ∈ ℝn subject to updates of the form update (i,Δ) causing the change xi ← xi + Δ, where i ε[n], Δ ∈ ℝ. Upon receiving a query, the goal is to report every \"heavy hitter\" i ∈ [n] with |xi| ≥ ε ∥x∥p as part of a list L ⊆ [n] of size O(1/εp), i.e. proportional to the maximum possible number of heavy hitters. For any pε(0,2] the COUNTSKETCH of [CCFC04] solves ℓp heavy hitters using O(ε-p lg n) words of space with O(lg n) update time, O(n lg n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n) [JST11, Section 4.4]. This space bound is optimal even in the strict turnstile model [JST11] in which it is promised that xi ≥ 0 for all i ∈ [n] at all points in the stream, but unfortunately the query time is very slow. To remedy this, the work [CM05] proposed the \"dyadic trick\" for the COUNTMIN sketch for p = 1 in the strict turnstile model, which to maintain whp correctness achieves suboptimal space O(ε-1lg2 n), worse update time O(lg2 n), but much better query time O(ε-1poly(lg n)). An extension to all p ∈ (0,2] appears in [KNPW11, Theorem 1], and can be obtained from [Pag13]. We show that this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, EXPANDERSKETCH, which in the most general turnstile model achieves optimal O(ε-plog n) space, O(log n) update time, and fast O(ε-ppoly(log n)) query time, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the \"dyadic trick\" by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a \"cluster-preserving clustering\" algorithm, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our cluster-preserving clustering may be of broader interest much beyond heavy hitters.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122877106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Convergence of MCMC and Loopy BP in the Tree Uniqueness Region for the Hard-Core Model 硬核模型树唯一性区域MCMC和Loopy BP的收敛性

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-04-05 DOI: 10.1109/FOCS.2016.80

Charilaos Efthymiou, Thomas P. Hayes, Daniel Stefankovic, Eric Vigoda, Yitong Yin

We study the hard-core (gas) model defined on independent sets of an input graph where the independent sets are weighted by a parameter (aka fugacity) λ > 0. For constant Δ, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree Δ when λ <; λ_c(Δ). Sly (2010) showed that there is no FPRAS, unless NP=RP, when λ > λ_c(Δ). The threshold λ_c(Δ) is the critical point for the statistical physics phase transition for uniqueness/non-uniqueness on the infinite Δ-regular tree. The running time of Weitz's algorithm is exponential in log Δ. Here we present an FPRAS for the partition function whose running time is O* (n²). We analyze the simple single-site Markov chain known as the Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant Δ₀ such that for all graphs with maximum degree Δ > Δ₀ and girth > 7 (i.e., no cycles of length ≤ 6), the mixing time of the Glauber dynamics is O(nlog n) when λ <; λ_c(Δ). Our work complements that of Weitz which applies for small constant Δ whereas our work applies for all Δ at least a sufficiently large constant Δ₀ (this includes Δ depending on n = IVI). Our proof utilizes loopy BP (belief propagation) which is a widely-used algorithm for inference in graphical models. A novel aspect of our work is using the principal eigenvector for the BP operator to design a distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP. As a byproduct we obtain that the Glauber dynamics, after a short burn-in period, converges close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution when the girth ≥ 6 and λ <; λ_c(Δ).

我们研究了在输入图的独立集上定义的硬核(气体)模型，其中独立集由参数(也称为逸度)λ > 0加权。对于常数Δ， Weitz(2006)先前的工作建立了λ c(Δ)时最大次为Δ的图的配分函数的FPTAS。Sly(2010)表明，当λ > λc(Δ)时，不存在FPRAS，除非NP=RP。阈值λc(Δ)是无限Δ-regular树的唯一性/非唯一性统计物理相变的临界点。Weitz算法的运行时间是log Δ的指数。本文给出了运行时间为O* (n2)的配分函数的FPRAS。我们分析了简单的单点马尔科夫链，即从相关的吉布斯分布中抽样的格劳伯动力学。我们证明了存在一个常数Δ0，使得对于所有最大度Δ > Δ0且周长> 7的图(即不存在长度≤6的循环)，当λ c(Δ)时，Glauber动力学的混合时间为O(nlog n)。我们的工作补充了Weitz的工作，Weitz适用于小常数Δ，而我们的工作适用于所有Δ至少一个足够大的常数Δ0(这包括依赖于n = IVI的Δ)。我们的证明利用了环路BP(信念传播)，这是一种广泛使用的算法，用于图形模型的推理。我们工作的一个新颖方面是使用BP算子的主特征向量来设计一个距离函数，该函数在期望中收缩表现为BP不动点的状态对。我们还证明了格劳伯动力学在局部表现为环状BP。作为一个副产物，我们得到了在一个短的磨合期后，Glauber动力学收敛于BP不动点附近，这意味着环路BP的不动点是Gibbs分布的近似值。利用这些连接，我们建立了当周长≥6且λ c(Δ)时，环状BP快速收敛到Gibbs分布。

{"title":"Convergence of MCMC and Loopy BP in the Tree Uniqueness Region for the Hard-Core Model","authors":"Charilaos Efthymiou, Thomas P. Hayes, Daniel Stefankovic, Eric Vigoda, Yitong Yin","doi":"10.1109/FOCS.2016.80","DOIUrl":"https://doi.org/10.1109/FOCS.2016.80","url":null,"abstract":"We study the hard-core (gas) model defined on independent sets of an input graph where the independent sets are weighted by a parameter (aka fugacity) λ > 0. For constant Δ, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree Δ when λ <; λc(Δ). Sly (2010) showed that there is no FPRAS, unless NP=RP, when λ > λc(Δ). The threshold λc(Δ) is the critical point for the statistical physics phase transition for uniqueness/non-uniqueness on the infinite Δ-regular tree. The running time of Weitz's algorithm is exponential in log Δ. Here we present an FPRAS for the partition function whose running time is O* (n2). We analyze the simple single-site Markov chain known as the Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant Δ0 such that for all graphs with maximum degree Δ > Δ0 and girth > 7 (i.e., no cycles of length ≤ 6), the mixing time of the Glauber dynamics is O(nlog n) when λ <; λc(Δ). Our work complements that of Weitz which applies for small constant Δ whereas our work applies for all Δ at least a sufficiently large constant Δ0 (this includes Δ depending on n = IVI). Our proof utilizes loopy BP (belief propagation) which is a widely-used algorithm for inference in graphical models. A novel aspect of our work is using the principal eigenvector for the BP operator to design a distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP. As a byproduct we obtain that the Glauber dynamics, after a short burn-in period, converges close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution when the girth ≥ 6 and λ <; λc(Δ).","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132094233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics 局部搜索给出欧几里得和次要自由度量中k-均值和k-中值的近似格式

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-03-31 DOI: 10.1109/FOCS.2016.46

Vincent Cohen-Addad, P. Klein, Claire Mathieu

We give the first polynomial-time approximation schemes (PTASs) for the following problems: (1) uniform facility location in edge-weighted planar graphs, (2) k-median and k-means in edge-weighted planar graphs, (3) k-means in Euclidean space of bounded dimension. Our first and second results extend to minor-closed families of graphs. All our results extend to cost functions that are the pth power of the shortest-path distance. The algorithm is local search where the local neighborhood of a solution S consists of all solutions obtained from S by removing and adding 1/εO(1) centers.

针对以下问题，给出了第一类多项式时间逼近格式:(1)边加权平面图中设施的均匀定位;(2)边加权平面图中的k-中值和k-均值;(3)有界欧几里得空间中的k-均值。我们的第一和第二个结果推广到图的小闭族。我们所有的结果都延伸到最短路径距离的p次方的代价函数。该算法是局部搜索算法，其中解S的局部邻域由通过去除和添加1/εO(1)个中心从S中得到的所有解组成。

引用次数: 128

Local Search Yields a PTAS for k-Means in Doubling Metrics 局部搜索产生k-Means加倍度量的PTAS

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-03-29 DOI: 10.1109/FOCS.2016.47

Zachary Friggstad, M. Rezapour, M. Salavatipour

The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-MEANS: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie in Euclidean space Rd for some d ≥ 2. k-MEANS and the first algorithms for it were introduced in the 1950's. Over the last six decades, hundreds of papers have studied this problem and different algorithms have been proposed for it. The most commonly used algorithm in practice is known as Lloyd-Forgy, which is also referred to as "the" k-MEANS algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a very simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9 + ε for any fixed ε > 0 for k-Umeans in Euclidean space. Finding an algorithm with a better worst-case approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d. More precisely, for any error parameter ε > 0, the local search algorithm that considers swaps of up to ρ = dO(d) · ε-O(d/ε) centres at a time will produce a solution using exactly k centres whose cost is at most a (1+ε)-factor greater than the optimum solution. Our analysis extends very easily to the more general settings where we want to minimize the sum of q'th powers of the distances between data points and their cluster centres (instead of sum of squares of distances as in k-MEANS) for any fixed q ≥ 1 and where the metric may not be Euclidean but still has fixed doubling dimension.

在几乎所有科学分支中遇到的最著名和最普遍的聚类问题无疑是k- means:给定一组数据点和参数k，选择k个中心，并将数据点围绕这些中心划分为k个聚类，从而使点到聚类中心的距离平方和最小化。通常这些数据点位于欧几里德空间Rd中，且d≥2。k-MEANS及其第一个算法是在20世纪50年代引入的。在过去的60年里，有数百篇论文研究了这个问题，并提出了不同的算法。在实践中最常用的算法被称为Lloyd-Forgy，它也被称为“k-MEANS”算法，它的各种扩展在实践中通常工作得很好。然而，与最优解决方案相比，它们可能产生成本任意大的解决方案。Kanungo等人[2004]分析了一种非常简单的局部搜索启发式算法，得到了欧几里德空间中k- u均值的近似比为9 + ε的任意固定ε > 0的多项式时间算法。寻找一种具有更好的最坏情况近似保证的算法一直是该领域最大的开放问题之一，特别是是否可以在固定维欧几里德空间中获得真正的PTAS。我们通过证明一个简单的局部搜索算法为任意固定d的Rd提供k- means的PTAS来解决这个问题。更准确地说，对于任何误差参数ε > 0，考虑一次至多ρ = dO(d)·ε- o (d/ε)中心交换的局部搜索算法将产生一个恰好使用k个中心的解，其代价最多大于最优解的(1+ε)因子。我们的分析很容易扩展到更一般的设置，我们想要最小化数据点和它们的簇中心之间的距离的q次方的总和(而不是k-MEANS中的距离的平方和)对于任何固定的q≥1，其中度量可能不是欧几里得，但仍然有固定的加倍维。

{"title":"Local Search Yields a PTAS for k-Means in Doubling Metrics","authors":"Zachary Friggstad, M. Rezapour, M. Salavatipour","doi":"10.1109/FOCS.2016.47","DOIUrl":"https://doi.org/10.1109/FOCS.2016.47","url":null,"abstract":"The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly k-MEANS: given a set of data points and a parameter k, select k centres and partition the data points into k clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie in Euclidean space Rd for some d ≥ 2. k-MEANS and the first algorithms for it were introduced in the 1950's. Over the last six decades, hundreds of papers have studied this problem and different algorithms have been proposed for it. The most commonly used algorithm in practice is known as Lloyd-Forgy, which is also referred to as \"the\" k-MEANS algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a very simple local search heuristic to get a polynomial-time algorithm with approximation ratio 9 + ε for any fixed ε > 0 for k-Umeans in Euclidean space. Finding an algorithm with a better worst-case approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for k-MEANS for Rd for any fixed d. More precisely, for any error parameter ε > 0, the local search algorithm that considers swaps of up to ρ = dO(d) · ε-O(d/ε) centres at a time will produce a solution using exactly k centres whose cost is at most a (1+ε)-factor greater than the optimum solution. Our analysis extends very easily to the more general settings where we want to minimize the sum of q'th powers of the distances between data points and their cluster centres (instead of sum of squares of distances as in k-MEANS) for any fixed q ≥ 1 and where the metric may not be Euclidean but still has fixed doubling dimension.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117043868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 120

Optimal Quantile Approximation in Streams 流中的最佳分位数近似

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-03-17 DOI: 10.1109/FOCS.2016.17

Zohar S. Karnin, Kevin J. Lang, Edo Liberty

This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An ε approximate quantile sketch receives a stream of items x1,⋯,xn and allows one to approximate the rank of any query item up to additive error ε n with probability at least 1-δ.The rank of a query x is the number of stream items such that xi ≤ x. The minimal sketch size required for this task is trivially at least 1/ε.Felber and Ostrovsky obtain a O((1/ε)log(1/ε)) space sketch for a fixed δ.Without restrictions on the nature of the stream or the ratio between ε and n, no better upper or lower bounds were known to date. This paper obtains an O((1/ε)log log (1/δ)) space sketch and a matching lower bound. This resolves the open problem and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log(1/ε)) lower bound is known. One of our contributions is a novel representation and modification of the widely used merge-and-reduce construction. This modification allows for an analysis which is both tight and extremely simple. The same technique was reported, in private communications, to be useful for improving other sketching objectives and geometric coreset constructions.

本文解决了流计算模型中一个长期存在的基本问题。即分位数草图的最优构造。ε近似分位数草图接收项目x1，⋯，xn的流，并允许人们以至少1-δ的概率近似任何查询项目的秩，直至可加性误差ε n。查询的秩x是流项的数量，这样xi≤x。这个任务所需的最小草图大小通常至少为1/ε。Felber和Ostrovsky得到了一个固定δ的O((1/ε)log(1/ε)空间草图。没有对流的性质或ε与n的比值的限制，迄今为止没有更好的上限或下限。本文得到了一个O((1/ε)log (1/δ))空间草图和一个匹配的下界。这解决了开放问题，并证明了随机和确定性分位数草图之间的定性差距，其中Ω((1/ε)log(1/ε))下界是已知的。我们的贡献之一是对广泛使用的合并和减少结构进行了新颖的表示和修改。这种修改使得分析既严密又极其简单。据报道，在私人通信中，同样的技术可用于改进其他素描目标和几何核心结构。

{"title":"Optimal Quantile Approximation in Streams","authors":"Zohar S. Karnin, Kevin J. Lang, Edo Liberty","doi":"10.1109/FOCS.2016.17","DOIUrl":"https://doi.org/10.1109/FOCS.2016.17","url":null,"abstract":"This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An ε approximate quantile sketch receives a stream of items x1,⋯,xn and allows one to approximate the rank of any query item up to additive error ε n with probability at least 1-δ.The rank of a query x is the number of stream items such that xi ≤ x. The minimal sketch size required for this task is trivially at least 1/ε.Felber and Ostrovsky obtain a O((1/ε)log(1/ε)) space sketch for a fixed δ.Without restrictions on the nature of the stream or the ratio between ε and n, no better upper or lower bounds were known to date. This paper obtains an O((1/ε)log log (1/δ)) space sketch and a matching lower bound. This resolves the open problem and proves a qualitative gap between randomized and deterministic quantile sketching for which an Ω((1/ε)log(1/ε)) lower bound is known. One of our contributions is a novel representation and modification of the widely used merge-and-reduce construction. This modification allows for an analysis which is both tight and extremely simple. The same technique was reported, in private communications, to be useful for improving other sketching objectives and geometric coreset constructions.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"373 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Explicit Non-malleable Extractors, Multi-source Extractors, and Almost Optimal Privacy Amplification Protocols 显式非延展性提取器，多源提取器和几乎最优的隐私放大协议

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-03-16 DOI: 10.1109/FOCS.2016.25

Eshan Chattopadhyay, Xin Li

We make progress in the following three problems: 1. Constructing optimal seeded non-malleable extractors, 2. Constructing optimal privacy amplification protocols with an active adversary, for any possible security parameter, 3. Constructing extractors for independent weak random sources, when the min-entropy is extremely small (i.e., near logarithmic). For the first two problems, the best known non-malleable extractors by Chattopadhyay, Goyal and Li, and by Cohen all require seed length and min-entropy with quadratic loss in parameters. As a result, the best known explicit privacy amplification protocols with an active adversary, which achieve two rounds of communication and optimal entropy loss was sub-optimal in the min-entropy of the source. In this paper we give an explicit non-malleable extractor that works for nearly optimal seed length and min-entropy, and yields a two-round privacy amplification protocol with optimal entropy loss for almost all ranges of the security parameter. For the third problem, we improve upon a very recent result by Cohen and Schulman and give an explicit extractor that uses an absolute constant number of sources, each with almost logarithmic min-entropy. The key ingredient in all our constructions is a generalized, and much more efficient version of the independence preserving merger introduced by Cohen, which we call non-malleable independence preserving merger. Our construction of the merger also simplifies that of Cohen and Schulman, and may be of independent interest.

我们在以下三个问题上取得了进展:1。构建最优的种子非延展性提取器，2。对于任何可能的安全参数，构建具有活跃对手的最优隐私放大协议;当最小熵非常小(即接近对数)时，为独立弱随机源构建提取器。对于前两个问题，Chattopadhyay, Goyal和Li以及Cohen最著名的非可延性提取器都需要种子长度和最小熵，参数损失为二次。因此，最著名的具有活跃对手的显式隐私放大协议(实现两轮通信和最优熵损失)在源的最小熵中是次优的。在本文中，我们给出了一个显式的非延展性提取器，它适用于几乎最优的种子长度和最小熵，并在几乎所有的安全参数范围内产生了一个具有最优熵损失的两轮隐私放大协议。对于第三个问题，我们改进了Cohen和Schulman最近的结果，并给出了一个显式提取器，该提取器使用绝对常数数量的源，每个源几乎具有对数最小熵。在我们所有的构造中，关键的成分是Cohen引入的一个广义的，更有效的独立保持合并，我们称之为不可延展性独立保持合并。我们对合并的构想也简化了Cohen和Schulman的构想，可能具有独立的利益。

{"title":"Explicit Non-malleable Extractors, Multi-source Extractors, and Almost Optimal Privacy Amplification Protocols","authors":"Eshan Chattopadhyay, Xin Li","doi":"10.1109/FOCS.2016.25","DOIUrl":"https://doi.org/10.1109/FOCS.2016.25","url":null,"abstract":"We make progress in the following three problems: 1. Constructing optimal seeded non-malleable extractors, 2. Constructing optimal privacy amplification protocols with an active adversary, for any possible security parameter, 3. Constructing extractors for independent weak random sources, when the min-entropy is extremely small (i.e., near logarithmic). For the first two problems, the best known non-malleable extractors by Chattopadhyay, Goyal and Li, and by Cohen all require seed length and min-entropy with quadratic loss in parameters. As a result, the best known explicit privacy amplification protocols with an active adversary, which achieve two rounds of communication and optimal entropy loss was sub-optimal in the min-entropy of the source. In this paper we give an explicit non-malleable extractor that works for nearly optimal seed length and min-entropy, and yields a two-round privacy amplification protocol with optimal entropy loss for almost all ranges of the security parameter. For the third problem, we improve upon a very recent result by Cohen and Schulman and give an explicit extractor that uses an absolute constant number of sources, each with almost logarithmic min-entropy. The key ingredient in all our constructions is a generalized, and much more efficient version of the independence preserving merger introduced by Cohen, which we call non-malleable independence preserving merger. Our construction of the merger also simplifies that of Cohen and Schulman, and may be of independent interest.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

An Exponential Separation between Randomized and Deterministic Complexity in the LOCAL Model 局部模型中随机复杂度与确定性复杂度的指数分离

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-02-26 DOI: 10.1109/FOCS.2016.72

Yi-Jun Chang, T. Kopelowitz, S. Pettie

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. We prove that these exponential gaps are necessary and establish numerous connections between the deterministic and randomized complexities in the LOCAL model. Each of our results has a very compelling take-away message: 1) Building on the recent randomized lower bounds of Brandt et al. [1], we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ is O(log Δ log n + log*n), for any Δ > = 55, whereas its deterministic complexity is Ω(log Δ n) for any Δ > = 3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees. 2) We prove that any deterministic algorithm for a natural class of problems that runs in O(1) + o(log Δ n) rounds can be transformed to run in O(log*n - log*Δ + 1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log Δ n) time deterministically. This gives an alternate proof that deterministically Δ-coloring a tree with small Δ takes Ω(log Δ n) rounds. 3) We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √log n. This shows that a deterministic Ω(log Δ n) lower bound for any problem (Δ-coloring a tree, for example) implies a randomized Ω(log Δ log n) lower bound. It also illustrates that the graph shattering technique employed in recent randomized symmetry breaking algorithms is absolutely essential to the LOCAL model. For example, it is provably impossible to improve the 2O(√log log n) term in the complexities of the best MIS and (Δ+1)-coloring algorithms without also improving the 2O(√log n)-round Panconesi-Srinivasan algorithm.

在过去的30年里，人们设计了许多算法来解决局部模型中的对称性破坏问题，如最大匹配、MIS、顶点着色和边缘着色。对于大多数问题，最佳随机算法至少要比最佳确定性算法快得多。我们证明了这些指数间隙是必要的，并在局部模型的确定性和随机复杂性之间建立了许多联系。我们的每个结果都有一个非常引人注目的信息:1)基于Brandt等人最近的随机下界[1]，我们证明了对于任何Δ > = 55，具有最大度Δ的树Δ-coloring的随机复杂度为O(log Δ log n + log*n)，而对于任何Δ > = 3，其确定性复杂度为Ω(log Δ n)。这也在Δ-coloring和(Δ+1)着色树的确定性复杂性之间建立了很大的分离。2)我们证明了对于一个自然问题的任意确定性算法，其运行周期为O(1) + O(log Δ n)轮，可以转化为O(log*n - log*Δ + 1)轮。如果转换后的算法违反了下界(甚至允许随机化)，那么可以得出结论，该问题需要Ω(log Δ n)时间。这提供了另一种证明，确定地Δ-coloring具有较小Δ的树需要Ω(log Δ n)轮。3)我们证明了任何自然问题在大小为n的实例上的随机复杂度至少是其在大小为√log n的实例上的确定性复杂度。这表明任何问题(例如Δ-coloring一棵树)的确定性Ω(log Δ n)下界意味着一个随机的Ω(log Δ log n)下界。这也说明了在最近的随机对称破缺算法中采用的图破碎技术对局部模型是绝对必要的。例如，如果不改进2O(√log n)轮Panconesi-Srinivasan算法，就不可能改进最好的MIS和(Δ+1)-着色算法的复杂度中的2O(√log log n)项。

{"title":"An Exponential Separation between Randomized and Deterministic Complexity in the LOCAL Model","authors":"Yi-Jun Chang, T. Kopelowitz, S. Pettie","doi":"10.1109/FOCS.2016.72","DOIUrl":"https://doi.org/10.1109/FOCS.2016.72","url":null,"abstract":"Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. We prove that these exponential gaps are necessary and establish numerous connections between the deterministic and randomized complexities in the LOCAL model. Each of our results has a very compelling take-away message: 1) Building on the recent randomized lower bounds of Brandt et al. [1], we prove that the randomized complexity of Δ-coloring a tree with maximum degree Δ is O(log Δ log n + log*n), for any Δ > = 55, whereas its deterministic complexity is Ω(log Δ n) for any Δ > = 3. This also establishes a large separation between the deterministic complexity of Δ-coloring and (Δ+1)-coloring trees. 2) We prove that any deterministic algorithm for a natural class of problems that runs in O(1) + o(log Δ n) rounds can be transformed to run in O(log*n - log*Δ + 1) rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires Ω(log Δ n) time deterministically. This gives an alternate proof that deterministically Δ-coloring a tree with small Δ takes Ω(log Δ n) rounds. 3) We prove that the randomized complexity of any natural problem on instances of size n is at least its deterministic complexity on instances of size √log n. This shows that a deterministic Ω(log Δ n) lower bound for any problem (Δ-coloring a tree, for example) implies a randomized Ω(log Δ log n) lower bound. It also illustrates that the graph shattering technique employed in recent randomized symmetry breaking algorithms is absolutely essential to the LOCAL model. For example, it is provably impossible to improve the 2O(√log log n) term in the complexities of the best MIS and (Δ+1)-coloring algorithms without also improving the 2O(√log n)-round Panconesi-Srinivasan algorithm.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130956557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 138

Noisy Population Recovery in Polynomial Time 多项式时间的噪声种群恢复

2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

Pub Date : 2016-02-24 DOI: 10.1109/FOCS.2016.77

Anindya De, M. Saks, Sijian Tang

In the noisy population recovery problem of Dvir et al. [6], the goal is to learn an unknown distribution f on binary strings of length n from noisy samples. A noisy sample with parameter μ ∈ [0,1] is generated by selecting a sample from f, and independently flipping each coordinate of the sample with probability (1-μ)/2. We assume an upper bound k on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error ε. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We describe an algorithm that for each μ > 0, provides the desired estimate of the distribution in time bounded by a polynomial in k, n and 1/ε improving upon the previous best result of poly(klog log k, n, 1/ε) due to Lovett and Zhang [9]. Our proof combines ideas from [9] with a noise attenuated version of Möbius inversion. The latter crucially uses the robust local inverse construction of Moitra and Saks [11].

在Dvir等人[6]的有噪声种群恢复问题中，目标是从有噪声的样本中学习长度为n的二进制字符串上的未知分布。从f中选取一个样本，并以(1-μ)/2的概率独立翻转样本的每个坐标，生成参数μ∈[0,1]的噪声样本。我们假设支持分布的大小有一个上限k，目标是估计任何字符串在给定误差ε内的概率。已知该问题的算法复杂度和样本复杂度是多项式相关的。我们描述了一种算法，该算法在先前由Lovett和Zhang[9]得出的poly(klog log k, n, 1/ε)的最佳结果的基础上，对每个μ > 0提供了以k, n和1/ε的多项式为界的时间分布的期望估计。我们的证明结合了[9]的思想和Möbius反演的噪声衰减版本。后者关键地使用了Moitra和Saks的鲁棒局部逆构造[11]。

引用次数: 16