FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing

Proceedings of the 37th International Conference on Supercomputing Pub Date : 2023-06-21 DOI:10.1145/3577193.3593729

X. Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li

{"title":"FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing","authors":"X. Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li","doi":"10.1145/3577193.3593729","DOIUrl":null,"url":null,"abstract":"As graph size (numbers of vertices and edges) is increasing from billions to trillions, efficient graph processing requires exascale computing clusters, which consist of hundreds of thousands of nodes connected via hierarchical networks with multiple levels of communication domains, e.g., multilevel triangle communication domains. While the computation of traversal-centric graph algorithms is relatively simple (e.g., status check), communication is the bottleneck due to the transfer of numerous small messages among hierarchical triangle communication domains. in this paper, we propose FT-topo, a communication-efficient graph partitioning policy for processing exascale graphs. The key idea of FT-topo is to directly map the big graph onto the hierarchical topology of exascale clusters. We carry out extensive experimentation by running various graph algorithms with synthetic graphs and real-world graphs on both Tianhe supercomputer and commercial clusters to show the advantages of FT-topo. FT-topo substantially mitigates communication overhead and thus is orders of magnitude faster than that of the state-of-the-art methods. In particular, FT-topo-based Tianhe supercomputer is superior to the fastest BFS and SSSP systems in the latest Graph500 lists. Furthermore, we deployed FT-topo on other large-scale clusters and it greatly improves graph processing performance on other commercial clusters. FT-topo-based graph operators outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude on real-world graphs.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As graph size (numbers of vertices and edges) is increasing from billions to trillions, efficient graph processing requires exascale computing clusters, which consist of hundreds of thousands of nodes connected via hierarchical networks with multiple levels of communication domains, e.g., multilevel triangle communication domains. While the computation of traversal-centric graph algorithms is relatively simple (e.g., status check), communication is the bottleneck due to the transfer of numerous small messages among hierarchical triangle communication domains. in this paper, we propose FT-topo, a communication-efficient graph partitioning policy for processing exascale graphs. The key idea of FT-topo is to directly map the big graph onto the hierarchical topology of exascale clusters. We carry out extensive experimentation by running various graph algorithms with synthetic graphs and real-world graphs on both Tianhe supercomputer and commercial clusters to show the advantages of FT-topo. FT-topo substantially mitigates communication overhead and thus is orders of magnitude faster than that of the state-of-the-art methods. In particular, FT-topo-based Tianhe supercomputer is superior to the fastest BFS and SSSP systems in the latest Graph500 lists. Furthermore, we deployed FT-topo on other large-scale clusters and it greatly improves graph processing performance on other commercial clusters. FT-topo-based graph operators outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude on real-world graphs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FT-topo:架构驱动的折叠三角形分区，用于通信高效的图形处理

随着图的大小(顶点和边的数量)从数十亿增加到数万亿，高效的图处理需要百亿亿级的计算集群，这些集群由数十万个节点组成，这些节点通过具有多层通信域的分层网络连接，例如多层三角形通信域。虽然以遍历为中心的图算法的计算相对简单(例如，状态检查)，但由于在分层三角形通信域之间传输大量小消息，通信是瓶颈。在本文中，我们提出了一种用于处理百亿亿级图的通信高效图分区策略FT-topo。FT-topo的关键思想是直接将大图映射到百亿亿级集群的分层拓扑结构上。我们通过在天河超级计算机和商业集群上运行各种图形算法(包括合成图和真实图)进行了广泛的实验，以显示FT-topo的优势。FT-topo大大降低了通信开销，因此比最先进的方法快了几个数量级。特别是，在最新的Graph500榜单中，基于ft拓扑的天河超级计算机优于最快的BFS和SSSP系统。此外，我们在其他大规模集群上部署了FT-topo，它大大提高了其他商业集群的图形处理性能。在现实世界的图上，基于ft拓扑的图算子在数量级上优于最先进的图划分和图系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 37th International Conference on Supercomputing

自引率

0.00%

发文量