Learning the Threshold in Hierarchical Agglomerative Clustering

2006 5th International Conference on Machine Learning and Applications (ICMLA'06) Pub Date : 2006-12-14 DOI:10.1109/ICMLA.2006.33

K. Daniels, C. Giraud-Carrier

引用次数: 23

Abstract

Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

层次聚类阈值的学习

大多数分区聚类算法需要先验地设置所需聚类的数量。这不仅有点违反直觉，而且除了在最简单的情况下，它也很困难。相比之下，分层集群可以创建具有不同数量集群的分区。实际的最终分区取决于所使用的相似性度量的阈值。给定一个聚类质量度量，人们可以通过半监督学习的形式有效地发现一个适当的阈值。本文给出了一种利用f测度和标记样本的小子集的完全链接层次聚集聚类的解决方案。实证评价表明前景看好

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)

自引率

0.00%

发文量