A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference

Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu
{"title":"A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference","authors":"Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu","doi":"10.1109/ISVLSI59464.2023.10238597","DOIUrl":null,"url":null,"abstract":"The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \\times 3.32 \\times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI59464.2023.10238597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \times 3.32 \times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用激活非结构化稀疏性进行高效DNN推理的数字SRAM内存计算设计
内存计算(CIM)架构已成为设计节能深度神经网络处理器的一种有前途的方法。虽然以前的CIM设计已经探索了DNN权稀疏性的使用,但这些方法通常涉及以特定的方式修剪权矩阵。这个过程可能会增加新的计算复杂性,并对深度神经网络的精度产生负面影响。然而,几乎没有任何数字CIM电路利用激活中的稀疏性,由于ReLU激活函数,在许多场景中,稀疏性是自然的。为了充分利用激活非结构化稀疏性,我们提出了一种数字SRAM CIM。本电路采用booth编码方案设计,采用基于累加器的乘法累加(MAC)计算电路结构。它利用SRAM位线(BL)计算获得矩阵稀疏信息,并采用分配器为SRAM- cim分配数据计算。该设计在40纳米CMOS工艺下实现并进行了评估。我们的评估结果表明,所提出的电路在1.1 V时可以实现1 GHz的时钟频率,峰值性能为819.2 GOPS,并且在50%-90%稀疏度的情况下,SRAM-CIM比密集模式实现了1.12倍3.32倍的加速,节能48.2%至90.57%。当以90%的稀疏度执行8位矩阵乘法时,能量效率为10.57 TOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Compact Ferroelectric 2T-(n+1)C Cell to Implement AND-OR Logic in Memory 3D-TTP: Efficient Transient Temperature-Aware Power Budgeting for 3D-Stacked Processor-Memory Systems CellFlow: Automated Standard Cell Design Flow Versatile Signal Distribution Networks for Scalable Placement and Routing of Field-coupled Nanocomputing Technologies Revisiting Trojan Insertion Techniques for Post-Silicon Trojan Detection Evaluation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1