{"title":"稳定性产生k-Median和k-Means聚类的PTAS","authors":"Pranjal Awasthi, Avrim Blum, Or Sheffet","doi":"10.1109/FOCS.2010.36","DOIUrl":null,"url":null,"abstract":"We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\\epsilon^2$, then one can achieve a $(1+f(\\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\\alpha$ for {\\em some} constant $\\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\\eps>0$ we achieve a $(1+\\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\\eps$ and $1/\\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \\log n)^{\\poly(1/\\epsilon,1/\\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\\alpha)$ approximations are $\\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\\delta n$ and $\\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\\delta)$ to $\\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\\delta$-close from $O(\\delta n)$ to $\\delta n$. Our results are based on a new notion of clustering stability.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":"{\"title\":\"Stability Yields a PTAS for k-Median and k-Means Clustering\",\"authors\":\"Pranjal Awasthi, Avrim Blum, Or Sheffet\",\"doi\":\"10.1109/FOCS.2010.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\\\\epsilon^2$, then one can achieve a $(1+f(\\\\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\\\\alpha$ for {\\\\em some} constant $\\\\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\\\\eps>0$ we achieve a $(1+\\\\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\\\\eps$ and $1/\\\\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \\\\log n)^{\\\\poly(1/\\\\epsilon,1/\\\\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\\\\alpha)$ approximations are $\\\\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\\\\delta n$ and $\\\\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\\\\delta)$ to $\\\\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\\\\delta$-close from $O(\\\\delta n)$ to $\\\\delta n$. Our results are based on a new notion of clustering stability.\",\"PeriodicalId\":228365,\"journal\":{\"name\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"93\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2010.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stability Yields a PTAS for k-Median and k-Means Clustering
We consider $k$-median clustering in finite metric spaces and $k$-means clustering in Euclidean spaces, in the setting where $k$ is part of the input (not a constant). For the $k$-means problem, Ostrovsky et al. show that if the optimal $(k-1)$-means clustering of the input is more expensive than the optimal $k$-means clustering by a factor of $1/\epsilon^2$, then one can achieve a $(1+f(\epsilon))$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$ by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the $(k-1)$-means optimal is more expensive than the $k$-means optimal by a factor $1+\alpha$ for {\em some} constant $\alpha>0$, we can obtain a PTAS. In particular, under this assumption, for any $\eps>0$ we achieve a $(1+\eps)$-approximation to the $k$-means optimal in time polynomial in $n$ and $k$, and exponential in $1/\eps$ and $1/\alpha$. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the $k$-median problem in finite metrics under the analogous assumption as well. For $k$-means, we in addition give a randomized algorithm with improved running time of $n^{O(1)}(k \log n)^{\poly(1/\epsilon,1/\alpha)}$. Our technique also obtains a PTAS under the assumption of Balcan et al. that all $(1+\alpha)$ approximations are $\delta$-close to a desired target clustering, in the case that all target clusters have size greater than $\delta n$ and $\alpha>0$ is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for $k$-means in Euclidean spaces we reduce the distance of the clustering found to the target from $O(\delta)$ to $\delta$ when all target clusters are large, and for $k$-median we improve the ``largeness'' condition needed in the work of Balcan et al. to get exactly $\delta$-close from $O(\delta n)$ to $\delta n$. Our results are based on a new notion of clustering stability.