{"title":"重叠社团检测算法中聚类指标的不稳定性","authors":"Diego Kiedanski, P. Rodríguez-Bocca","doi":"10.1109/CLEI53233.2021.9640094","DOIUrl":null,"url":null,"abstract":"In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"31 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Instability of clustering metrics in overlapping community detection algorithms\",\"authors\":\"Diego Kiedanski, P. Rodríguez-Bocca\",\"doi\":\"10.1109/CLEI53233.2021.9640094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.\",\"PeriodicalId\":6803,\"journal\":{\"name\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"volume\":\"31 1\",\"pages\":\"1-11\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 XLVII Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI53233.2021.9640094\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Instability of clustering metrics in overlapping community detection algorithms
In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.