Fair Clustering Ensemble With Equal Cluster Capacity

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-11-28 DOI:10.1109/TPAMI.2024.3507857

Peng Zhou;Rongwen Li;Zhaolong Ling;Liang Du;Xinwang Liu

{"title":"Fair Clustering Ensemble With Equal Cluster Capacity","authors":"Peng Zhou;Rongwen Li;Zhaolong Ling;Liang Du;Xinwang Liu","doi":"10.1109/TPAMI.2024.3507857","DOIUrl":null,"url":null,"abstract":"Clustering ensemble has been widely studied in data mining and machine learning. However, the existing clustering ensemble methods do not pay attention to fairness, which is important in real-world applications, especially in applications involving humans. To address this issue, this paper proposes a novel fair clustering ensemble method, which takes multiple base clustering results as inputs and learns a fair consensus clustering result. When designing the algorithm, we observe that one of the widely used definitions of fairness may cause a cluster imbalance problem. To tackle this problem, we give a new definition of fairness that can simultaneously characterize fairness and cluster capacity equality. Based on this new definition, we design an extremely simple yet effective regularized term to achieve fairness and cluster capacity equality. We plug this regularized term into our clustering ensemble framework, finally leading to our new fair clustering ensemble method. The extensive experiments show that, compared with the state-of-the-art clustering ensemble methods, our method can not only achieve a comparable or even better clustering performance, but also obtain a much fairer and better capacity equality result, which well demonstrates the effectiveness and superiority of our method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1729-1746"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10770826/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Clustering ensemble has been widely studied in data mining and machine learning. However, the existing clustering ensemble methods do not pay attention to fairness, which is important in real-world applications, especially in applications involving humans. To address this issue, this paper proposes a novel fair clustering ensemble method, which takes multiple base clustering results as inputs and learns a fair consensus clustering result. When designing the algorithm, we observe that one of the widely used definitions of fairness may cause a cluster imbalance problem. To tackle this problem, we give a new definition of fairness that can simultaneously characterize fairness and cluster capacity equality. Based on this new definition, we design an extremely simple yet effective regularized term to achieve fairness and cluster capacity equality. We plug this regularized term into our clustering ensemble framework, finally leading to our new fair clustering ensemble method. The extensive experiments show that, compared with the state-of-the-art clustering ensemble methods, our method can not only achieve a comparable or even better clustering performance, but also obtain a much fairer and better capacity equality result, which well demonstrates the effectiveness and superiority of our method.

查看原文