Mustafa R. Kadhim, WEN-HONG Tian, Guangyao Zhou, Tahseen Khan
{"title":"A Novel Side-Information for Unsupervised Cluster Ensemble","authors":"Mustafa R. Kadhim, WEN-HONG Tian, Guangyao Zhou, Tahseen Khan","doi":"10.1109/ICCWAMTIP53232.2021.9674113","DOIUrl":null,"url":null,"abstract":"Many clustering and cluster ensemble models have been proposed recently and have not addressed two concerns; when a single model is executed multiple times on a dataset, it predicts various labels for each data object; however, these various labels have a small correctness ratio due to the randomness in generating values in each implementation. Further, detecting which the correct label from these diverse answers is complicated, specifically when the unsupervised model works on a real-world application and needs to deliver a single correct label to the user. In this work considered these two issues by proposing a novel unsupervised constraints termed Inherited Constraints (IC) that behaves as semi-supervised constraints generation. Moreover, execute the IC needs a cluster model to utilize; thus, we proposed an unsupervised cluster ensemble model by integrating the Density Peaks cluster ensemble framework (DPE) and IC to improve the performance. This model is termed DPEIC. Further, we proposed a model termed Answer Settlements (AS) to detect a single correct label for each data object from the diverse answers obtained by DPEIC after utilized multiple times to consider the most duplicated labels as the correct ones. We compare DPEIC-AS with several state-of-the-arts to validate the strengths of this work. The experimental results indicate that DPEIC-AS outperforms the compared models at a different rate, ranging from 3% to 93%. Also, The AS assisted two state-of-the-arts methods to detect the correct labels with the highest possibilities from diverse answers.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many clustering and cluster ensemble models have been proposed recently and have not addressed two concerns; when a single model is executed multiple times on a dataset, it predicts various labels for each data object; however, these various labels have a small correctness ratio due to the randomness in generating values in each implementation. Further, detecting which the correct label from these diverse answers is complicated, specifically when the unsupervised model works on a real-world application and needs to deliver a single correct label to the user. In this work considered these two issues by proposing a novel unsupervised constraints termed Inherited Constraints (IC) that behaves as semi-supervised constraints generation. Moreover, execute the IC needs a cluster model to utilize; thus, we proposed an unsupervised cluster ensemble model by integrating the Density Peaks cluster ensemble framework (DPE) and IC to improve the performance. This model is termed DPEIC. Further, we proposed a model termed Answer Settlements (AS) to detect a single correct label for each data object from the diverse answers obtained by DPEIC after utilized multiple times to consider the most duplicated labels as the correct ones. We compare DPEIC-AS with several state-of-the-arts to validate the strengths of this work. The experimental results indicate that DPEIC-AS outperforms the compared models at a different rate, ranging from 3% to 93%. Also, The AS assisted two state-of-the-arts methods to detect the correct labels with the highest possibilities from diverse answers.