On the combination of relative clustering validity criteria

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2013-07-29 DOI:10.1145/2484838.2484844

L. Vendramin, P. Jaskowiak, R. Campello

{"title":"On the combination of relative clustering validity criteria","authors":"L. Vendramin, P. Jaskowiak, R. Campello","doi":"10.1145/2484838.2484844","DOIUrl":null,"url":null,"abstract":"Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"58 1","pages":"4:1-4:12"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

相对聚类效度准则的组合研究

存在许多不同的相对聚类有效性标准，它们作为评估数据分区质量的定量度量非常有用。这些标准被赋予了特定的特征，使它们中的每一个都更适合于特定类别的问题。然而，每个标准的性能通常是未知的先验用户。因此，选择一个特定的标准并不是一项微不足道的任务。规避这一缺点的一种可能的方法是将不同的相对标准结合起来，以获得更可靠的评估。然而，到目前为止，这种方法只以一种特别的方式应用;它的真正潜力实际上还没有得到很好的理解。在本文中，我们对考虑合成和真实数据集的相关标准的组合进行了广泛的研究。实验涉及28个标准和4种不同的组合策略，这些策略应用于5种聚类算法产生的不同数据分区集合。总共考虑了972个合成数据集的427,680个分区和400个图像数据集的14,000个分区。根据结果，我们讨论了将不同的相关标准合并成一个委员会的缺点和可能的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量

期刊最新文献

Towards Co-Evolution of Data-Centric Ecosystems. Data perturbation for outlier detection ensembles SLACID - sparse linear algebra in a column-oriented in-memory database system SensorBench: benchmarking approaches to processing wireless sensor network data Efficient data management and statistics with zero-copy integration