A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

IF 5.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2024-07-26 DOI:10.1109/TCSI.2024.3430831

Yuechen Chen;Ahmed Louri;Shanshan Liu;Fabrizio Lombardi

{"title":"A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training","authors":"Yuechen Chen;Ahmed Louri;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TCSI.2024.3430831","DOIUrl":null,"url":null,"abstract":"Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 10","pages":"4638-4651"},"PeriodicalIF":5.2000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10612221/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于高效 CNN 训练的平衡稀疏矩阵卷积加速器

众所周知，稀疏卷积神经网络（CNN）的训练因大量片外内存流量而耗时。为了有效地部署稀疏训练，现有加速器以压缩格式存储矩阵，以消除内存访问零的情况；因此，加速器被设计为处理压缩矩阵，以避免零计算。我们发现，不同格式矩阵的稀疏程度对压缩率有很大影响。由于在整个稀疏训练过程中，激活、权重、误差和梯度矩阵的稀疏程度各不相同，因此在整个训练过程中使用奇异压缩方法来实现稳定的高压缩率是不切实际的。此外，矩阵中的随机零会导致不规则的计算模式，进一步增加执行时间。为解决这些问题，我们提出了一种用于高效 CNN 训练的平衡稀疏矩阵卷积加速器设计。具体来说，我们开发了一种双矩阵压缩技术，将两种广泛使用的稀疏矩阵压缩格式与一种控制算法无缝结合，以降低训练期间的内存流量。在这种压缩技术的基础上，还设计了一种两级工作量平衡技术，以进一步缩短执行时间和降低能耗。最后，实现了一种加速器来支持所提出的技术。精确的周期仿真结果表明，与现有的稀疏训练加速器相比，所提出的加速器平均缩短了 34% 的执行时间，减少了 24% 的能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程：电子与电气

CiteScore

9.80

自引率

11.80%

发文量

441

审稿时长

2 months

期刊介绍： TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.

期刊最新文献

IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors