{"title":"Voting-based Approach in Consensus Clustering through q-fold cross-validation","authors":"Norin Rahayu Shamsuddin, N. Mahat","doi":"10.1285/I20705948V12N3P657","DOIUrl":null,"url":null,"abstract":"Over the past 50 years, extensive research have been carried out to understand how clustering work in classifying data into meaningful groups. Various clustering algorithms and cluster validity indexes have been proposedand improvised to obtain the best clustering result. However, there is noclustering method that is able to give consistent results on similar structureof a dataset. An alternative mechanism to control the variation of resultsand improved the quality of traditional clustering is through consensus clustering. In this paper, we generate multiple partitions of consensus clusteringthrough a resampling method by employing q-fold cross-validation approach.q-fold cross-validation approach is able to speed-up the consensus partitionsprocedure with qth iterations. To encounter with different number of cluster labels occur in the partitions, we employed voting-based method in the second stage of consensus clustering to obtain optimal consensus partition.The performance of optimal consensus partitions is evaluated from Silhouetteplot","PeriodicalId":44770,"journal":{"name":"Electronic Journal of Applied Statistical Analysis","volume":"12 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Applied Statistical Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1285/I20705948V12N3P657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past 50 years, extensive research have been carried out to understand how clustering work in classifying data into meaningful groups. Various clustering algorithms and cluster validity indexes have been proposedand improvised to obtain the best clustering result. However, there is noclustering method that is able to give consistent results on similar structureof a dataset. An alternative mechanism to control the variation of resultsand improved the quality of traditional clustering is through consensus clustering. In this paper, we generate multiple partitions of consensus clusteringthrough a resampling method by employing q-fold cross-validation approach.q-fold cross-validation approach is able to speed-up the consensus partitionsprocedure with qth iterations. To encounter with different number of cluster labels occur in the partitions, we employed voting-based method in the second stage of consensus clustering to obtain optimal consensus partition.The performance of optimal consensus partitions is evaluated from Silhouetteplot