A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2024-07-26 DOI:10.1109/TCSI.2024.3430831
Yuechen Chen;Ahmed Louri;Shanshan Liu;Fabrizio Lombardi
{"title":"A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training","authors":"Yuechen Chen;Ahmed Louri;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TCSI.2024.3430831","DOIUrl":null,"url":null,"abstract":"Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"4638-4651"},"PeriodicalIF":5.2000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10612221/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于高效 CNN 训练的平衡稀疏矩阵卷积加速器
众所周知,稀疏卷积神经网络(CNN)的训练因大量片外内存流量而耗时。为了有效地部署稀疏训练,现有加速器以压缩格式存储矩阵,以消除内存访问零的情况;因此,加速器被设计为处理压缩矩阵,以避免零计算。我们发现,不同格式矩阵的稀疏程度对压缩率有很大影响。由于在整个稀疏训练过程中,激活、权重、误差和梯度矩阵的稀疏程度各不相同,因此在整个训练过程中使用奇异压缩方法来实现稳定的高压缩率是不切实际的。此外,矩阵中的随机零会导致不规则的计算模式,进一步增加执行时间。为解决这些问题,我们提出了一种用于高效 CNN 训练的平衡稀疏矩阵卷积加速器设计。具体来说,我们开发了一种双矩阵压缩技术,将两种广泛使用的稀疏矩阵压缩格式与一种控制算法无缝结合,以降低训练期间的内存流量。在这种压缩技术的基础上,还设计了一种两级工作量平衡技术,以进一步缩短执行时间和降低能耗。最后,实现了一种加速器来支持所提出的技术。精确的周期仿真结果表明,与现有的稀疏训练加速器相比,所提出的加速器平均缩短了 34% 的执行时间,减少了 24% 的能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Circuits and Systems I: Regular Papers
IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程:电子与电气
CiteScore
9.80
自引率
11.80%
发文量
441
审稿时长
2 months
期刊介绍: TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.
期刊最新文献
IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1