{"title":"On the combination of relative clustering validity criteria","authors":"L. Vendramin, P. Jaskowiak, R. Campello","doi":"10.1145/2484838.2484844","DOIUrl":null,"url":null,"abstract":"Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"58 1","pages":"4:1-4:12"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.