A Fast and Flexible Clustering Algorithm Using Binary Discretization

M. Sugiyama, Akihiro Yamamoto
{"title":"A Fast and Flexible Clustering Algorithm Using Binary Discretization","authors":"M. Sugiyama, Akihiro Yamamoto","doi":"10.1109/ICDM.2011.9","DOIUrl":null,"url":null,"abstract":"We present in this paper a new clustering algorithm for multivariate data. This algorithm, called BOOL (Binary coding Oriented clustering), can detect arbitrarily shaped clusters and is noise tolerant. BOOL handles data using a two-step procedure: data points are first discretized and represented as binary words, clusters are then iteratively constructed by agglomerating smaller clusters using this representation. This latter step is carried out with linear complexity by sorting such binary representations, which results in dramatic speedups when compared with other techniques. Experiments show that BOOL is faster than K-means, and about two to three orders of magnitude faster than two state-of-the-art algorithms that can detect non-convex clusters of arbitrary shapes. We also show that BOOL's results are robust to changes in parameters, whereas most algorithms for arbitrarily shaped clusters are known to be overly sensitive to such changes. The key to the robustness of BOOL is the hierarchical structure of clusters that is introduced automatically by increasing the accuracy of the discretization.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

We present in this paper a new clustering algorithm for multivariate data. This algorithm, called BOOL (Binary coding Oriented clustering), can detect arbitrarily shaped clusters and is noise tolerant. BOOL handles data using a two-step procedure: data points are first discretized and represented as binary words, clusters are then iteratively constructed by agglomerating smaller clusters using this representation. This latter step is carried out with linear complexity by sorting such binary representations, which results in dramatic speedups when compared with other techniques. Experiments show that BOOL is faster than K-means, and about two to three orders of magnitude faster than two state-of-the-art algorithms that can detect non-convex clusters of arbitrary shapes. We also show that BOOL's results are robust to changes in parameters, whereas most algorithms for arbitrarily shaped clusters are known to be overly sensitive to such changes. The key to the robustness of BOOL is the hierarchical structure of clusters that is introduced automatically by increasing the accuracy of the discretization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种快速灵活的二值离散聚类算法
本文提出了一种新的多元数据聚类算法。这种算法被称为BOOL(二进制编码导向聚类),可以检测任意形状的聚类,并且具有耐噪性。BOOL使用两步过程处理数据:首先将数据点离散化并表示为二进制词,然后通过使用这种表示聚合较小的集群来迭代地构建集群。后一步是通过对这种二进制表示进行排序来实现线性复杂性的,与其他技术相比,这导致了戏剧性的加速。实验表明BOOL比K-means快,比两种最先进的算法快两到三个数量级,可以检测任意形状的非凸簇。我们还表明BOOL的结果对参数的变化具有鲁棒性,而已知大多数用于任意形状群集的算法对此类变化过于敏感。BOOL鲁棒性的关键在于通过提高离散化的精度而自动引入的聚类的层次结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation Helix: Unsupervised Grammar Induction for Structured Activity Recognition Partitionable Kernels for Mapping Kernels Multi-task Learning for Bayesian Matrix Factorization Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1