LightSpMV:在支持cuda的gpu上更快的基于csr的稀疏矩阵向量乘法

Yongchao Liu, B. Schmidt
{"title":"LightSpMV:在支持cuda的gpu上更快的基于csr的稀疏矩阵向量乘法","authors":"Yongchao Liu, B. Schmidt","doi":"10.1109/ASAP.2015.7245713","DOIUrl":null,"url":null,"abstract":"Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors. In LightSpMV, two dynamic row distribution approaches have been investigated at the vector and warp levels with atomic operations and warp shuffle functions as the fundamental building blocks. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE libraries. Performance evaluation reveals that on the same Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively. LightSpMV is available at http://lightspmv.sourceforge.net.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"12 1","pages":"82-89"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs\",\"authors\":\"Yongchao Liu, B. Schmidt\",\"doi\":\"10.1109/ASAP.2015.7245713\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors. In LightSpMV, two dynamic row distribution approaches have been investigated at the vector and warp levels with atomic operations and warp shuffle functions as the fundamental building blocks. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE libraries. Performance evaluation reveals that on the same Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively. LightSpMV is available at http://lightspmv.sourceforge.net.\",\"PeriodicalId\":6642,\"journal\":{\"name\":\"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"12 1\",\"pages\":\"82-89\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.2015.7245713\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2015.7245713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

压缩稀疏行(CSR)是一种常用的稀疏矩阵存储格式。然而,在支持cuda的gpu上,最先进的基于csr的稀疏矩阵向量乘法(SpMV)实现并没有表现出非常高的效率。这推动了GPU计算的一些替代存储格式的发展。不幸的是,这些替代方案与大多数以cpu为中心的程序不兼容,并且需要在运行时从CSR进行动态转换,因此会产生大量的计算和存储开销。我们提出了LightSpMV,一种使用标准CSR格式的新型cuda兼容SpMV算法,该算法通过利用扭曲/矢量上矩阵行的细粒度动态分布来实现高速。在LightSpMV中,以原子操作和warp shuffle函数作为基本构建块,在矢量和warp级别研究了两种动态行分布方法。我们使用各种稀疏矩阵对LightSpMV进行了评估,并进一步将其与最先进的CUSP和cuSPARSE库中基于csr的SpMV子程序进行了比较。性能评估显示,在相同的Tesla K40c GPU上,LightSpMV优于CUSP和cuSPARSE,在单精度和双精度上,LightSpMV的加速分别比CUSP高2.60和2.63,比cuSPARSE高1.93和1.79。LightSpMV可在http://lightspmv.sourceforge.net上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs
Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors. In LightSpMV, two dynamic row distribution approaches have been investigated at the vector and warp levels with atomic operations and warp shuffle functions as the fundamental building blocks. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE libraries. Performance evaluation reveals that on the same Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively. LightSpMV is available at http://lightspmv.sourceforge.net.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the Conference Chairs - ASAP 2020 Message from the ASAP 2016 chairs An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers Application-set driven exploration for custom processor architectures Stochastic circuit design and performance evaluation of vector quantization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1