A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs

Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker
{"title":"A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs","authors":"Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker","doi":"10.1109/AICAS57966.2023.10168566","DOIUrl":null,"url":null,"abstract":"A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种以硬件为中心的cnn正则激活稀疏度增加和减少方法
除了大量的计算外,卷积神经网络(cnn)计算的一个关键挑战是从主存储器到本地存储器的大量能量密集型事务。在本文中,我们提出了一种有条理的方法来最大化和修剪在专用数据流架构的CNN推理过程中激活特征映射中的粗粒度规则块稀疏性。适合目标加速器的规则稀疏性,例如,收缩阵列或矢量处理器,与不规则稀疏性相比,允许简化和资源廉价的修剪,节省内存事务和计算。我们基于阈值的技术允许最大化每层中规则稀疏块的数量。由稀疏块数量和网络精度之间的密切相关而产生的大范围阈值组合可以通过我们的勘探工具Spex自动勘探。为了利用发现的稀疏块进行内存事务处理和MAC操作减少,我们还提出了sparse - blox,这是一种用于普通神经网络硬件加速器的低开销硬件扩展。Sparse-Blox的面积比最先进的不规则稀疏加速器扩展少5倍。在ResNet-50和ylo -v5s上使用Spex对我们的块修剪方法进行评估显示,内存传输减少了18.9%和12.6%,MAC操作减少了802 M(19.0%)和1.5 G (24.3%), mAP精度分别下降了1%或1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synaptic metaplasticity with multi-level memristive devices Unsupervised Learning of Spike-Timing-Dependent Plasticity Based on a Neuromorphic Implementation A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application Convergent Waveform Relaxation Schemes for the Transient Analysis of Associative ReLU Arrays Performance Assessment of an Extremely Energy-Efficient Binary Neural Network Using Adiabatic Superconductor Devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1