基于随机投影的快速无参数密度聚类

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2505590

Johannes Schneider, M. Vlachos

{"title":"基于随机投影的快速无参数密度聚类","authors":"Johannes Schneider, M. Vlachos","doi":"10.1145/2505515.2505590","DOIUrl":null,"url":null,"abstract":"Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Fast parameterless density-based clustering via random projections\",\"authors\":\"Johannes Schneider, M. Vlachos\",\"doi\":\"10.1145/2505515.2505590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.\",\"PeriodicalId\":20528,\"journal\":{\"name\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd ACM international conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2505515.2505590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2505590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

聚类为数据分析提供了重要的见解。基于密度的算法已经成为一种灵活而高效的技术，能够发现高质量和潜在不规则形状的集群。提出了两种基于随机投影的快速密度聚类算法。与同等的基于密度的技术相比，这两种算法都显示出一到两个数量级的加速，即使对于中等规模的数据集也是如此。我们对我们的算法进行了全面的分析，并显示了d维数据集的运行时间为O(dnlog2n)。我们的第一个算法可以看作是OPTICS基于密度的算法的快速变体，但是使用了更柔和的密度定义和采样相结合。第二种算法是无参数的，它识别分离聚类的区域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast parameterless density-based clustering via random projections

Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Exploring XML data is as easy as using maps Mining-based compression approach of propositional formulae Flexible and dynamic compromises for effective recommendations Efficient parsing-based search over structured data Recommendation via user's personality and social contextual