Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication

IF 1.5 3区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Architecture and Code Optimization Pub Date : 2024-03-17 DOI:10.1145/3653020
Soojin Hwang, Daehyeon Baek, Jongse Park, Jaehyuk Huh
{"title":"Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication","authors":"Soojin Hwang, Daehyeon Baek, Jongse Park, Jaehyuk Huh","doi":"10.1145/3653020","DOIUrl":null,"url":null,"abstract":"<p>The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm and matrix representation. There have been two widely used algorithms and data representations. Two algorithms, scalar multiplication and dot product, can be combined with two sparse data representations, compressed sparse and bitmap formats for the matrix and vector. Although the prior accelerators adopted one of the possible designs, it is yet to be investigated which design is the best one across different hardware resources and workload characteristics. This paper first investigates the impact of design choices with respect to the algorithm and data representation. Our evaluation shows that no single design always outperforms the others across different workloads, but the two best designs (i.e. compressed sparse format and bitmap format with dot product) have complementary performance with trade-offs incurred by the matrix characteristics. Based on the analysis, this study proposes Cerberus, a triple-mode accelerator supporting two sparse operation modes in addition to the base dense mode. To allow such multi-mode operation, it proposes a prediction model based on matrix characteristics under a given hardware configuration, which statically selects the best mode for a given sparse matrix with its dimension and density information. Our experimental results show that Cerberus provides 12.1 × performance improvements from a dense-only accelerator, and 1.5 × improvements from a fixed best SpMV design.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"28 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653020","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm and matrix representation. There have been two widely used algorithms and data representations. Two algorithms, scalar multiplication and dot product, can be combined with two sparse data representations, compressed sparse and bitmap formats for the matrix and vector. Although the prior accelerators adopted one of the possible designs, it is yet to be investigated which design is the best one across different hardware resources and workload characteristics. This paper first investigates the impact of design choices with respect to the algorithm and data representation. Our evaluation shows that no single design always outperforms the others across different workloads, but the two best designs (i.e. compressed sparse format and bitmap format with dot product) have complementary performance with trade-offs incurred by the matrix characteristics. Based on the analysis, this study proposes Cerberus, a triple-mode accelerator supporting two sparse operation modes in addition to the base dense mode. To allow such multi-mode operation, it proposes a prediction model based on matrix characteristics under a given hardware configuration, which statically selects the best mode for a given sparse matrix with its dimension and density information. Our experimental results show that Cerberus provides 12.1 × performance improvements from a dense-only accelerator, and 1.5 × improvements from a fixed best SpMV design.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cerberus:稀疏矩阵和矢量乘法的三重模式加速
稀疏矩阵与向量的乘法(SpMV)是高性能计算中应用最广泛的内核之一,也是稀疏神经网络的机器学习加速。SpMV 加速器的设计空间有两个轴:算法和矩阵表示。目前有两种广泛使用的算法和数据表示。标量乘法和点乘这两种算法可以与两种稀疏数据表示方式相结合,即矩阵和矢量的压缩稀疏格式和位图格式。虽然之前的加速器采用了其中一种可能的设计,但在不同的硬件资源和工作负载特征下,哪种设计是最佳的,还有待研究。本文首先研究了设计选择对算法和数据表示的影响。我们的评估结果表明,在不同的工作负载中,没有一种设计总是优于其他设计,但两种最佳设计(即压缩稀疏格式和点积位图格式)在性能上具有互补性,矩阵特性会产生权衡。基于上述分析,本研究提出了一种三重模式加速器 Cerberus,除基本密集模式外,还支持两种稀疏操作模式。为了实现这种多模式运行,它提出了一个基于给定硬件配置下矩阵特性的预测模型,该模型会根据给定稀疏矩阵的维度和密度信息,为其静态选择最佳模式。实验结果表明,Cerberus 的性能比纯密集加速器提高了 12.1 倍,比固定最佳 SpMV 设计提高了 1.5 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization 工程技术-计算机:理论方法
CiteScore
3.60
自引率
6.20%
发文量
78
审稿时长
6-12 weeks
期刊介绍: ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.
期刊最新文献
A Survey of General-purpose Polyhedral Compilers Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated Memory FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1