Instability of clustering metrics in overlapping community detection algorithms

Diego Kiedanski, P. Rodríguez-Bocca
{"title":"Instability of clustering metrics in overlapping community detection algorithms","authors":"Diego Kiedanski, P. Rodríguez-Bocca","doi":"10.1109/CLEI53233.2021.9640094","DOIUrl":null,"url":null,"abstract":"In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.","PeriodicalId":6803,"journal":{"name":"2021 XLVII Latin American Computing Conference (CLEI)","volume":"31 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 XLVII Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI53233.2021.9640094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this paper, we study the impact of data complexity and data quality in the overlapping community detection problem. We show that community detection algorithms are very unstable against incomplete or erroneous data, and this result is consistent with all the evaluated performance metrics. We verify it using three quality metrics (F1, NMI, and Omega) when the ground-truth community structure is known, in four very popular and representative detection algorithms: Order Statistics Local Optimization Method (OSLOM), Greedy Clique Expansion (GCE) algorithm, Speaker-listener Label Propagation Algorithm (SLPA), and Cluster Affiliation Model for Big Networks (BIG-CLAM). We evaluate it over a set of real instances that arise from detecting the courses that belong to different careers (degrees) of an engineering University, and over large benchmark sets of synthetic instances frequently used in the literature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
重叠社团检测算法中聚类指标的不稳定性
本文研究了数据复杂度和数据质量对重叠社区检测问题的影响。我们表明,社区检测算法对不完整或错误的数据非常不稳定,这一结果与所有评估的性能指标一致。当真实社区结构已知时,我们使用三个质量指标(F1, NMI和Omega)在四种非常流行和具有代表性的检测算法中验证它:顺序统计局部优化方法(OSLOM),贪婪集团扩展(GCE)算法,扬声器-听众标签传播算法(SLPA)和大网络群集关联模型(Big - clam)。我们通过一组真实的实例来评估它,这些实例来自于检测属于工程大学不同职业(学位)的课程,以及在文献中经常使用的合成实例的大型基准集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structured Text Generation for Spanish Freestyle Battles using Neural Networks Learning factory for the Software Engineering area: First didactic transformation An Early Alert System for Software Vulnerabilities based on Vulnerability Repositories and Social Networks Data Quality Management oriented to the Electronic Medical Record Program Committees
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1