{"title":"A Clustering Ensemble Algorithm of Computing Stability of Sample Points Based on Neighborhood","authors":"TongLing Lou","doi":"10.1109/ISCEIC53685.2021.00015","DOIUrl":null,"url":null,"abstract":"In the process of clustering ensemble, different sample points play different roles in the ensemble results, and the certainty of each sample point in the distribution to each cluster is also different. In order to reduce the impact of this uncertainty on clustering results, some scholars proposed the concept of sample stability. In this paper, we propose to calculate the stability of sample points by calculating the probability of the occurrence of sample points and sample points in their neighborhood in the same cluster of different base clusters, and propose an algorithm framework based on this calculation method. In this paper, the original data are first clustered to calculate the Mahalanobis distance between the sample points. Then, the co-occurrence probability of the target sample point and its nearest K sample points is calculated. According to the cooccurrence probability, the stability of each sample point is calculated. First, the stable sample points are hard clustered, and then the unstable sample points are assigned to the nearest cluster. The effectiveness of the proposed clustering ensemble algorithm is verified on benchmark datasets.","PeriodicalId":342968,"journal":{"name":"2021 2nd International Symposium on Computer Engineering and Intelligent Communications (ISCEIC)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Symposium on Computer Engineering and Intelligent Communications (ISCEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCEIC53685.2021.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the process of clustering ensemble, different sample points play different roles in the ensemble results, and the certainty of each sample point in the distribution to each cluster is also different. In order to reduce the impact of this uncertainty on clustering results, some scholars proposed the concept of sample stability. In this paper, we propose to calculate the stability of sample points by calculating the probability of the occurrence of sample points and sample points in their neighborhood in the same cluster of different base clusters, and propose an algorithm framework based on this calculation method. In this paper, the original data are first clustered to calculate the Mahalanobis distance between the sample points. Then, the co-occurrence probability of the target sample point and its nearest K sample points is calculated. According to the cooccurrence probability, the stability of each sample point is calculated. First, the stable sample points are hard clustered, and then the unstable sample points are assigned to the nearest cluster. The effectiveness of the proposed clustering ensemble algorithm is verified on benchmark datasets.