IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI:10.1109/HiPC56025.2022.00028

Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman

{"title":"IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization","authors":"Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman","doi":"10.1109/HiPC56025.2022.00028","DOIUrl":null,"url":null,"abstract":"Influence maximization (IM) is a fundamental operation among graph problems that involve simulating a stochastic diffusion process on real-world networks. Given a graph G(V, E), the objective is to identify a small set of key influential \"seeds\"— i.e., a fixed-size set of k nodes, which when influenced is likely to lead to the maximum number of nodes in the network getting influenced. The problem has numerous applications including (but not limited to) viral marketing in social networks, epidemic control in contact networks, and in finding influential proteins in molecular networks. Despite its importance, application of influence maximization at scale continues to pose significant challenges. While the problem is NP-hard, efficient approximation algorithms that use greedy hill climbing are used in practice. However those algorithms consume hours of multithreaded execution time even on modest-sized inputs with hundreds of thousands of nodes. In this paper, we present IMpart, a partitioning-based approach to accelerate greedy hill climbing based IM approaches on both shared and distributed memory computers. In particular, we present two parallel algorithms— one that uses graph partitioning (IMpart-metis) and another that uses community-aware partitioning (IMpart-gratis)— with provable guarantees on the quality of approximation. Experimental results show that our approaches are able to deliver two to three orders of magnitude speedup over a state-of-the-art multithreaded hill climbing implementation with negligible loss in quality. For instance, on one of the modest-sized inputs (Slashdot: 73K nodes; 905K edges), our partitioning-based shared memory implementation yields 4610× speedup, reducing the runtime from 9h 36m to 7 seconds on 128 threads. Furthermore, our distributed memory implementation enhances problem size reach to graph inputs with ×106 nodes and ×108 edges and enables sub-minute computation of IM solutions.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Influence maximization (IM) is a fundamental operation among graph problems that involve simulating a stochastic diffusion process on real-world networks. Given a graph G(V, E), the objective is to identify a small set of key influential "seeds"— i.e., a fixed-size set of k nodes, which when influenced is likely to lead to the maximum number of nodes in the network getting influenced. The problem has numerous applications including (but not limited to) viral marketing in social networks, epidemic control in contact networks, and in finding influential proteins in molecular networks. Despite its importance, application of influence maximization at scale continues to pose significant challenges. While the problem is NP-hard, efficient approximation algorithms that use greedy hill climbing are used in practice. However those algorithms consume hours of multithreaded execution time even on modest-sized inputs with hundreds of thousands of nodes. In this paper, we present IMpart, a partitioning-based approach to accelerate greedy hill climbing based IM approaches on both shared and distributed memory computers. In particular, we present two parallel algorithms— one that uses graph partitioning (IMpart-metis) and another that uses community-aware partitioning (IMpart-gratis)— with provable guarantees on the quality of approximation. Experimental results show that our approaches are able to deliver two to three orders of magnitude speedup over a state-of-the-art multithreaded hill climbing implementation with negligible loss in quality. For instance, on one of the modest-sized inputs (Slashdot: 73K nodes; 905K edges), our partitioning-based shared memory implementation yields 4610× speedup, reducing the runtime from 9h 36m to 7 seconds on 128 threads. Furthermore, our distributed memory implementation enhances problem size reach to graph inputs with ×106 nodes and ×108 edges and enables sub-minute computation of IM solutions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

传授:一种基于分割的并行方法来加速影响最大化

影响最大化(IM)是模拟真实网络上随机扩散过程的图问题中的一种基本运算。给定一个图G(V, E)，目标是识别一小组关键的有影响力的“种子”——即固定大小的k个节点集，当这些节点受到影响时，可能会导致网络中受影响的节点数量最多。这个问题有许多应用，包括(但不限于)社交网络中的病毒式营销，接触网络中的流行病控制，以及在分子网络中寻找有影响力的蛋白质。尽管影响力最大化很重要，但大规模应用它仍然构成重大挑战。虽然问题是np困难的，但在实践中使用了使用贪婪爬坡的高效近似算法。然而，即使在具有数十万个节点的中等大小的输入上，这些算法也会消耗数小时的多线程执行时间。在本文中，我们提出了一种基于分区的方法来加速基于贪婪爬坡的IM方法在共享和分布式内存计算机上的应用。特别地，我们提出了两种并行算法-一种使用图划分(IMpart-metis)和另一种使用社区感知划分(IMpart-gratis) -具有可证明的近似质量保证。实验结果表明，我们的方法能够比最先进的多线程爬坡实现提供两到三个数量级的加速，而质量损失可以忽略不计。例如，在一个中等大小的输入(Slashdot: 73K节点;905K边)，我们基于分区的共享内存实现产生4610倍的加速，在128个线程上将运行时间从9小时36分钟减少到7秒。此外，我们的分布式内存实现增强了具有×106节点和×108边的图输入的问题大小，并实现了IM解决方案的亚分钟计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量