FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing

X. Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li
{"title":"FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing","authors":"X. Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li","doi":"10.1145/3577193.3593729","DOIUrl":null,"url":null,"abstract":"As graph size (numbers of vertices and edges) is increasing from billions to trillions, efficient graph processing requires exascale computing clusters, which consist of hundreds of thousands of nodes connected via hierarchical networks with multiple levels of communication domains, e.g., multilevel triangle communication domains. While the computation of traversal-centric graph algorithms is relatively simple (e.g., status check), communication is the bottleneck due to the transfer of numerous small messages among hierarchical triangle communication domains. in this paper, we propose FT-topo, a communication-efficient graph partitioning policy for processing exascale graphs. The key idea of FT-topo is to directly map the big graph onto the hierarchical topology of exascale clusters. We carry out extensive experimentation by running various graph algorithms with synthetic graphs and real-world graphs on both Tianhe supercomputer and commercial clusters to show the advantages of FT-topo. FT-topo substantially mitigates communication overhead and thus is orders of magnitude faster than that of the state-of-the-art methods. In particular, FT-topo-based Tianhe supercomputer is superior to the fastest BFS and SSSP systems in the latest Graph500 lists. Furthermore, we deployed FT-topo on other large-scale clusters and it greatly improves graph processing performance on other commercial clusters. FT-topo-based graph operators outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude on real-world graphs.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As graph size (numbers of vertices and edges) is increasing from billions to trillions, efficient graph processing requires exascale computing clusters, which consist of hundreds of thousands of nodes connected via hierarchical networks with multiple levels of communication domains, e.g., multilevel triangle communication domains. While the computation of traversal-centric graph algorithms is relatively simple (e.g., status check), communication is the bottleneck due to the transfer of numerous small messages among hierarchical triangle communication domains. in this paper, we propose FT-topo, a communication-efficient graph partitioning policy for processing exascale graphs. The key idea of FT-topo is to directly map the big graph onto the hierarchical topology of exascale clusters. We carry out extensive experimentation by running various graph algorithms with synthetic graphs and real-world graphs on both Tianhe supercomputer and commercial clusters to show the advantages of FT-topo. FT-topo substantially mitigates communication overhead and thus is orders of magnitude faster than that of the state-of-the-art methods. In particular, FT-topo-based Tianhe supercomputer is superior to the fastest BFS and SSSP systems in the latest Graph500 lists. Furthermore, we deployed FT-topo on other large-scale clusters and it greatly improves graph processing performance on other commercial clusters. FT-topo-based graph operators outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude on real-world graphs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FT-topo:架构驱动的折叠三角形分区,用于通信高效的图形处理
随着图的大小(顶点和边的数量)从数十亿增加到数万亿,高效的图处理需要百亿亿级的计算集群,这些集群由数十万个节点组成,这些节点通过具有多层通信域的分层网络连接,例如多层三角形通信域。虽然以遍历为中心的图算法的计算相对简单(例如,状态检查),但由于在分层三角形通信域之间传输大量小消息,通信是瓶颈。在本文中,我们提出了一种用于处理百亿亿级图的通信高效图分区策略FT-topo。FT-topo的关键思想是直接将大图映射到百亿亿级集群的分层拓扑结构上。我们通过在天河超级计算机和商业集群上运行各种图形算法(包括合成图和真实图)进行了广泛的实验,以显示FT-topo的优势。FT-topo大大降低了通信开销,因此比最先进的方法快了几个数量级。特别是,在最新的Graph500榜单中,基于ft拓扑的天河超级计算机优于最快的BFS和SSSP系统。此外,我们在其他大规模集群上部署了FT-topo,它大大提高了其他商业集群的图形处理性能。在现实世界的图上,基于ft拓扑的图算子在数量级上优于最先进的图划分和图系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing Using Additive Modifications in LU Factorization Instead of Pivoting GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1