稳定性产生k-Median和k-Means聚类的PTAS

Pranjal Awasthi, Avrim Blum, Or Sheffet
{"title":"稳定性产生k-Median和k-Means聚类的PTAS","authors":"Pranjal Awasthi, Avrim Blum, Or Sheffet","doi":"10.1109/FOCS.2010.36","DOIUrl":null,"url":null,"abstract":"We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\\epsilon^2$, then one can achieve a $(1+f(\\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\\alpha$ for {\\em some} constant $\\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\\eps>0$ we achieve a $(1+\\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\\eps$ and $1/\\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \\log n)^{\\poly(1/\\epsilon,1/\\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\\alpha)$ approximations are $\\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\\delta n$ and $\\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\\delta)$ to $\\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\\delta$-close from $O(\\delta n)$ to $\\delta n$. Our results are based on a new notion of clustering stability.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":"{\"title\":\"Stability Yields a PTAS for k-Median and k-Means Clustering\",\"authors\":\"Pranjal Awasthi, Avrim Blum, Or Sheffet\",\"doi\":\"10.1109/FOCS.2010.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\\\\epsilon^2$, then one can achieve a $(1+f(\\\\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\\\\alpha$ for {\\\\em some} constant $\\\\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\\\\eps>0$ we achieve a $(1+\\\\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\\\\eps$ and $1/\\\\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \\\\log n)^{\\\\poly(1/\\\\epsilon,1/\\\\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\\\\alpha)$ approximations are $\\\\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\\\\delta n$ and $\\\\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\\\\delta)$ to $\\\\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\\\\delta$-close from $O(\\\\delta n)$ to $\\\\delta n$. Our results are based on a new notion of clustering stability.\",\"PeriodicalId\":228365,\"journal\":{\"name\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"93\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2010.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 93

摘要

我们考虑$k$ -有限度量空间中的中位数聚类和$k$ -欧几里德空间中的均值聚类,其中$k$是输入的一部分(不是常数)。对于$k$ -means问题,Ostrovsky等人表明,如果输入的最优$(k-1)$ -means聚类比最优$k$ -means聚类代价高$1/\epsilon^2$倍,则可以通过使用Lloyd算法的变体来实现$n$和$k$中$k$ -means最优时间多项式的$(1+f(\epsilon))$ -近似。在这项工作中,我们大大改进了这种近似保证。我们证明,仅给定对于常数$\alpha>0$, $(k-1)$ -均值最优比$k$ -均值最优贵一个因子{\em}$1+\alpha$的条件,我们就可以得到PTAS。特别是,在这个假设下,对于任何$\eps>0$,我们在$n$和$k$中实现了$k$ -均值最优时间多项式的$(1+\eps)$ -近似,在$1/\eps$和$1/\alpha$中实现了指数。因此,我们将假设的强度与近似比率的质量解耦。在类似的假设下,我们也给出了有限度量下$k$ -中值问题的PTAS。对于$k$ -means,我们还给出了一个随机化算法,改进了$n^{O(1)}(k \log n)^{\poly(1/\epsilon,1/\alpha)}$的运行时间。我们的技术还在Balcan等人的假设下获得了PTAS,即在所有目标聚类的大小大于$\delta n$且$\alpha>0$为常数的情况下,所有$(1+\alpha)$近似都是$\delta$ -接近期望的目标聚类。请注意,Balcan等人的动机是,对于许多聚类问题,目标函数只是接近目标这一真实目标的代理。从这个角度来看,我们的改进是,对于欧几里得空间中的$k$ -means,当所有目标簇都很大时,我们减少了发现的聚类到目标的距离,从$O(\delta)$到$\delta$,对于$k$ -median,我们改进了Balcan等人的工作所需的“大”条件,以便从$O(\delta n)$到$\delta n$精确地获得$\delta$ -接近。我们的结果是基于聚类稳定性的新概念。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Stability Yields a PTAS for k-Median and k-Means Clustering
We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\epsilon^2$, then one can achieve a $(1+f(\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\alpha$ for {\em some} constant $\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\eps>0$ we achieve a $(1+\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\eps$ and $1/\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \log n)^{\poly(1/\epsilon,1/\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\alpha)$ approximations are $\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\delta n$ and $\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\delta$-close from $O(\delta n)$ to $\delta n$. Our results are based on a new notion of clustering stability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the Computational Complexity of Coin Flipping The Monotone Complexity of k-clique on Random Graphs Local List Decoding with a Constant Number of Queries Agnostically Learning under Permutation Invariant Distributions Pseudorandom Generators for Regular Branching Programs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1