Fast parameterless density-based clustering via random projections

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2505590

Johannes Schneider, M. Vlachos

引用次数: 35

Abstract

Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于随机投影的快速无参数密度聚类

聚类为数据分析提供了重要的见解。基于密度的算法已经成为一种灵活而高效的技术，能够发现高质量和潜在不规则形状的集群。提出了两种基于随机投影的快速密度聚类算法。与同等的基于密度的技术相比，这两种算法都显示出一到两个数量级的加速，即使对于中等规模的数据集也是如此。我们对我们的算法进行了全面的分析，并显示了d维数据集的运行时间为O(dnlog2n)。我们的第一个算法可以看作是OPTICS基于密度的算法的快速变体，但是使用了更柔和的密度定义和采样相结合。第二种算法是无参数的，它识别分离聚类的区域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Exploring XML data is as easy as using maps Mining-based compression approach of propositional formulae Flexible and dynamic compromises for effective recommendations Efficient parsing-based search over structured data Recommendation via user's personality and social contextual