LACC:一种寻找分布式内存中连接组件的线性代数算法

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-20 DOI:10.1109/IPDPS.2019.00012

A. Azad, A. Buluç

{"title":"LACC:一种寻找分布式内存中连接组件的线性代数算法","authors":"A. Azad, A. Buluç","doi":"10.1109/IPDPS.2019.00012","DOIUrl":null,"url":null,"abstract":"Finding connected components is one of the most widely used operations on a graph. Optimal serial algorithms for the problem have been known for half a century, and many competing parallel algorithms have been proposed over the last several decades under various different models of parallel computation. This paper presents a parallel connected-components algorithm that can run on distributed-memory computers. Our algorithm uses linear algebraic primitives and is based on a PRAM algorithm by Awerbuch and Shiloach. We show that the resulting algorithm, named LACC for Linear Algebraic Connected Components, outperforms competitors by a factor of up to 12x for small to medium scale graphs. For large graphs with more than 50B edges, LACC scales to 4K nodes (262K cores) of a Cray XC40 supercomputer and outperforms previous algorithms by a significant margin. This remarkable performance is accomplished by (1) exploiting sparsity that was not present in the original PRAM algorithm formulation, (2) using high-performance primitives of Combinatorial BLAS, and (3) identifying hot spots and optimizing them away by exploiting algorithmic insights.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory\",\"authors\":\"A. Azad, A. Buluç\",\"doi\":\"10.1109/IPDPS.2019.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding connected components is one of the most widely used operations on a graph. Optimal serial algorithms for the problem have been known for half a century, and many competing parallel algorithms have been proposed over the last several decades under various different models of parallel computation. This paper presents a parallel connected-components algorithm that can run on distributed-memory computers. Our algorithm uses linear algebraic primitives and is based on a PRAM algorithm by Awerbuch and Shiloach. We show that the resulting algorithm, named LACC for Linear Algebraic Connected Components, outperforms competitors by a factor of up to 12x for small to medium scale graphs. For large graphs with more than 50B edges, LACC scales to 4K nodes (262K cores) of a Cray XC40 supercomputer and outperforms previous algorithms by a significant margin. This remarkable performance is accomplished by (1) exploiting sparsity that was not present in the original PRAM algorithm formulation, (2) using high-performance primitives of Combinatorial BLAS, and (3) identifying hot spots and optimizing them away by exploiting algorithmic insights.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

查找连接组件是图上使用最广泛的操作之一。该问题的最优串行算法已经存在了半个世纪，在过去的几十年里，在各种不同的并行计算模型下，提出了许多相互竞争的并行算法。提出了一种可在分布式存储计算机上运行的并行连接组件算法。我们的算法使用线性代数原语，并基于Awerbuch和Shiloach的PRAM算法。我们展示了结果算法，命名为线性代数连接组件LACC，在中小型图中优于竞争对手高达12倍的因素。对于具有超过50B条边的大型图，LACC可以扩展到Cray XC40超级计算机的4K节点(262K核)，并且显著优于以前的算法。这种卓越的性能是通过(1)利用原始PRAM算法公式中不存在的稀疏性，(2)使用组合BLAS的高性能原语，以及(3)通过利用算法洞察力识别热点并优化它们来实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory

Finding connected components is one of the most widely used operations on a graph. Optimal serial algorithms for the problem have been known for half a century, and many competing parallel algorithms have been proposed over the last several decades under various different models of parallel computation. This paper presents a parallel connected-components algorithm that can run on distributed-memory computers. Our algorithm uses linear algebraic primitives and is based on a PRAM algorithm by Awerbuch and Shiloach. We show that the resulting algorithm, named LACC for Linear Algebraic Connected Components, outperforms competitors by a factor of up to 12x for small to medium scale graphs. For large graphs with more than 50B edges, LACC scales to 4K nodes (262K cores) of a Cray XC40 supercomputer and outperforms previous algorithms by a significant margin. This remarkable performance is accomplished by (1) exploiting sparsity that was not present in the original PRAM algorithm formulation, (2) using high-performance primitives of Combinatorial BLAS, and (3) identifying hot spots and optimizing them away by exploiting algorithmic insights.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量