Scalable stochastic block partition

2017 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2017-09-01 DOI:10.1109/HPEC.2017.8091050

Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington

{"title":"Scalable stochastic block partition","authors":"Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington","doi":"10.1109/HPEC.2017.8091050","DOIUrl":null,"url":null,"abstract":"The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可伸缩随机块划分

大规模图数据的处理虽然对现实世界的应用程序很重要，也很有用，但仍然具有挑战性，特别是对于图分区这样的问题。最优图划分是np困难的，但有几种方法可以在合理的时间内提供近似解。然而，缩放这些近似算法也具有挑战性。在本文中，我们描述了我们为提高这样一种技术的可扩展性所做的努力，随机块分割是IEEE HPEC图挑战的基线算法[1]。我们的主要贡献是:改进了基线自下而上算法的并行化，特别是用于贝叶斯推理的马尔可夫链蒙特卡罗(MCMC)节点更新;一种新的自上而下的分而治之算法，既能降低静态分区的算法复杂度，又适用于流分区;并行单节点多cpu实现和并行多节点MPI实现。虽然我们关注的是算法的可扩展性，但我们的Python实现在多cpu单节点机器上，在图大小为100k个顶点分为8个子图的情况下，比最快的基线并行c++运行速度提高了1.65倍。对于一个分为4个子图的20k节点图，它在4台机器、256个cpu的集群上实现了61x的加速，对于一个分为8个子图的50k节点图，它在多cpu单节点机器上实现了441x的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量

期刊最新文献

Optimized task graph mapping on a many-core neuromorphic supercomputer Software-defined extreme scale networks for bigdata applications Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi xDCI, a data science cyberinfrastructure for interdisciplinary research Leakage energy reduction for hard real-time caches