Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, Fenglong Song, Zhiyong Liu
{"title":"Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU","authors":"Weizhi Xu, Hao Zhang, Shuai Jiao, Da Wang, Fenglong Song, Zhiyong Liu","doi":"10.1109/SNPD.2012.20","DOIUrl":null,"url":null,"abstract":"It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.","PeriodicalId":387936,"journal":{"name":"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2012.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.