Fermi GPU上基于缓存块方法的稀疏矩阵矢量乘法优化

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing Pub Date : 2012-08-08 DOI:10.1109/SNPD.2012.20

Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, Fenglong Song, Zhiyong Liu

{"title":"Fermi GPU上基于缓存块方法的稀疏矩阵矢量乘法优化","authors":"Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, Fenglong Song, Zhiyong Liu","doi":"10.1109/SNPD.2012.20","DOIUrl":null,"url":null,"abstract":"It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.","PeriodicalId":387936,"journal":{"name":"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU\",\"authors\":\"Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, Fenglong Song, Zhiyong Liu\",\"doi\":\"10.1109/SNPD.2012.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.\",\"PeriodicalId\":387936,\"journal\":{\"name\":\"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNPD.2012.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2012.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

稀疏矩阵向量乘法(SpMV)的性能调优是一项重要的任务，但由于其不规则性也是一项困难的任务。在本文中，我们提出了一种缓存阻塞方法来提高SpMV在新兴GPU架构上的性能。将稀疏矩阵划分为多个子块，以CSR格式存储。使用阻塞方法，向量x的对应部分可以在GPU缓存中重用，因此大大减少了访问向量x全局内存的时间。在GeForce GTX 480上的实验结果表明，在最佳情况下，采用缓存阻塞方法的SpMV内核比未阻塞的CSR内核快5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

自引率

0.00%

发文量