Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI:10.1109/SC.2014.69

Arash Ashari, N. Sedaghati, John Eisenlohr, S. Parthasarathy, P. Sadayappan

{"title":"Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications","authors":"Arash Ashari, N. Sedaghati, John Eisenlohr, S. Parthasarathy, P. Sadayappan","doi":"10.1109/SC.2014.69","DOIUrl":null,"url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible -- for instance when analyzing large dynamically evolving graphs. This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"135","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2014.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 135

Abstract

Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible -- for instance when analyzing large dynamically evolving graphs. This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

图形应用gpu上的快速稀疏矩阵向量乘法

稀疏矩阵向量乘法(SpMV)是一种应用广泛的计算核。稀疏矩阵最常用的格式是CSR (Compressed sparse Row，压缩稀疏行)，但最近开发了许多其他表示，以实现更高的SpMV性能。然而，替代表示通常会增加大量的预处理开销。虽然对于需要多次迭代调用使用相同矩阵的SpMV的应用程序，可以平摊较高的预处理开销，但这并不总是可行的——例如，在分析大型动态演化图时。本文提出了ACSR，一种自适应SpMV算法，它使用标准CSR格式，但通过将行组合成具有相似数量的非零元素的组(箱)来减少线程发散。此外，对于跨越大范围非零计数的箱中的行，动态并行性被利用。ACSR相对于其他建议的SpMV方法的一个重要优点是，它直接与标准CSR格式一起工作，因此避免了大量的预处理开销。ACSR的CUDA实现在表示幂律图的一组稀疏矩阵上优于NVIDIA CUSP和cuSPARSE库中的SpMV实现。我们还演示了ACSR在动态图分析中的使用，它比现有方法的改进更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量

期刊最新文献

Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems Fast Iterative Graph Computation: A Path Centric Approach Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications MSL: A Synthesis Enabled Language for Distributed Implementations A Communication-Optimal Framework for Contracting Distributed Tensors