Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

Arash Ashari, N. Sedaghati, John Eisenlohr, S. Parthasarathy, P. Sadayappan
{"title":"Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications","authors":"Arash Ashari, N. Sedaghati, John Eisenlohr, S. Parthasarathy, P. Sadayappan","doi":"10.1109/SC.2014.69","DOIUrl":null,"url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible -- for instance when analyzing large dynamically evolving graphs. This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"135","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2014.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 135

Abstract

Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible -- for instance when analyzing large dynamically evolving graphs. This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
图形应用gpu上的快速稀疏矩阵向量乘法
稀疏矩阵向量乘法(SpMV)是一种应用广泛的计算核。稀疏矩阵最常用的格式是CSR (Compressed sparse Row,压缩稀疏行),但最近开发了许多其他表示,以实现更高的SpMV性能。然而,替代表示通常会增加大量的预处理开销。虽然对于需要多次迭代调用使用相同矩阵的SpMV的应用程序,可以平摊较高的预处理开销,但这并不总是可行的——例如,在分析大型动态演化图时。本文提出了ACSR,一种自适应SpMV算法,它使用标准CSR格式,但通过将行组合成具有相似数量的非零元素的组(箱)来减少线程发散。此外,对于跨越大范围非零计数的箱中的行,动态并行性被利用。ACSR相对于其他建议的SpMV方法的一个重要优点是,它直接与标准CSR格式一起工作,因此避免了大量的预处理开销。ACSR的CUDA实现在表示幂律图的一组稀疏矩阵上优于NVIDIA CUSP和cuSPARSE库中的SpMV实现。我们还演示了ACSR在动态图分析中的使用,它比现有方法的改进更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems Fast Iterative Graph Computation: A Path Centric Approach Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications MSL: A Synthesis Enabled Language for Distributed Implementations A Communication-Optimal Framework for Contracting Distributed Tensors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1