Communication-avoiding parallel minimum cuts and connected components

Lukas Gianinazzi, Pavel Kalvoda, A. Palma, Maciej Besta, T. Hoefler
{"title":"Communication-avoiding parallel minimum cuts and connected components","authors":"Lukas Gianinazzi, Pavel Kalvoda, A. Palma, Maciej Besta, T. Hoefler","doi":"10.1145/3178487.3178504","DOIUrl":null,"url":null,"abstract":"We present novel scalable parallel algorithms for finding global minimum cuts and connected components, which are important and fundamental problems in graph processing. To take advantage of future massively parallel architectures, our algorithms are communication-avoiding: they reduce the costs of communication across the network and the cache hierarchy. The fundamental technique underlying our work is the randomized sparsification of a graph: removing a fraction of graph edges, deriving a solution for such a sparsified graph, and using the result to obtain a solution for the original input. We design and implement sparsification with O(1) synchronization steps. Our global minimum cut algorithm decreases communication costs and computation compared to the state-of-the-art, while our connected components algorithm incurs few cache misses and synchronization steps. We validate our approach by evaluating MPI implementations of the algorithms on a petascale supercomputer. We also provide an approximate variant of the minimum cut algorithm and show that it approximates the exact solutions well while using a fraction of cores in a fraction of time.","PeriodicalId":193776,"journal":{"name":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"2 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3178487.3178504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

We present novel scalable parallel algorithms for finding global minimum cuts and connected components, which are important and fundamental problems in graph processing. To take advantage of future massively parallel architectures, our algorithms are communication-avoiding: they reduce the costs of communication across the network and the cache hierarchy. The fundamental technique underlying our work is the randomized sparsification of a graph: removing a fraction of graph edges, deriving a solution for such a sparsified graph, and using the result to obtain a solution for the original input. We design and implement sparsification with O(1) synchronization steps. Our global minimum cut algorithm decreases communication costs and computation compared to the state-of-the-art, while our connected components algorithm incurs few cache misses and synchronization steps. We validate our approach by evaluating MPI implementations of the algorithms on a petascale supercomputer. We also provide an approximate variant of the minimum cut algorithm and show that it approximates the exact solutions well while using a fraction of cores in a fraction of time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通信-避免平行最小切割和连接组件
我们提出了一种新的可扩展并行算法来寻找全局最小切割和连接分量,这是图处理中的重要和基本问题。为了利用未来的大规模并行架构,我们的算法是避免通信的:它们减少了跨网络和缓存层次结构的通信成本。我们工作的基本技术是图的随机稀疏化:去除一小部分图边,推导出这种稀疏化图的解,并使用结果获得原始输入的解。我们设计并实现了O(1)个同步步骤的稀疏化。与最先进的算法相比,我们的全局最小割算法降低了通信成本和计算量,而我们的连接组件算法导致的缓存丢失和同步步骤很少。我们通过在千兆级超级计算机上评估算法的MPI实现来验证我们的方法。我们还提供了最小切割算法的近似变体,并表明它在一小部分时间内使用一小部分核心时可以很好地近似精确解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Graph partitioning applied to DAG scheduling to reduce NUMA effects Juggler: a dependence-aware task-based execution framework for GPUs Performance modeling for GPUs using abstract kernel emulation Automated code acceleration targeting heterogeneous openCL devices Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1