On Breaking Truss-Based and Core-Based Communities

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-02-14 DOI:10.1145/3644077

Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering

{"title":"On Breaking Truss-Based and Core-Based Communities","authors":"Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering","doi":"10.1145/3644077","DOIUrl":null,"url":null,"abstract":"We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph. This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3644077","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph.

This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于打破桁架式群落和核心式群落

我们提出了一个一般性问题，即找出给定图中最小的边子集，删除该边子集后，该图就不存在群落。我们在两个备受关注的群体概念下考虑这个问题：k-桁架和 k-核心。我们还引入了一个问题变体，即确定的子集包含给定节点集的边，并确保这些节点不包含在任何社群中；在我们的案例中，是 k-truss 或 k-core。这些问题可直接应用于社交网络：已识别的边可被用户隐藏或从输出图中删除；或应用于通信网络：已识别的边对应于重要的网络连接。我们展示了一系列理论和实践成果。在理论方面，我们通过非难性还原证明，我们引入的问题是 NP 难问题，事实上很难近似。对于基于 k 桁架的问题，我们还展示了精确的指数时间算法，以及最优解大小的非难下限。在实际应用方面，我们开发了一系列启发式算法，这些算法通过我们提出的高效数据结构得以加速，用于在边删除的情况下更新桁架或核心分解。此外，我们还开发了一种计算下限的算法。在 11 个真实图和合成图上进行的大量实验表明，我们的启发式方法是有效的，其性能优于自然基线，而且由于我们的数据结构，其效率也很高（比自然基线快两个数量级）。此外，我们还介绍了一个关于共同作者网络的案例研究，实验表明，去除我们的启发式方法识别出的边不会对输入图的聚类结构产生实质性影响。这项工作扩展了 KDD 2021 论文，提供了新的理论结果，并介绍了基于核心的问题和算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.