A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2023-06-20 DOI:10.1109/ISVLSI59464.2023.10238597

Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu

{"title":"A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference","authors":"Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu","doi":"10.1109/ISVLSI59464.2023.10238597","DOIUrl":null,"url":null,"abstract":"The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \\times 3.32 \\times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI59464.2023.10238597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \times 3.32 \times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用激活非结构化稀疏性进行高效DNN推理的数字SRAM内存计算设计

内存计算(CIM)架构已成为设计节能深度神经网络处理器的一种有前途的方法。虽然以前的CIM设计已经探索了DNN权稀疏性的使用，但这些方法通常涉及以特定的方式修剪权矩阵。这个过程可能会增加新的计算复杂性，并对深度神经网络的精度产生负面影响。然而，几乎没有任何数字CIM电路利用激活中的稀疏性，由于ReLU激活函数，在许多场景中，稀疏性是自然的。为了充分利用激活非结构化稀疏性，我们提出了一种数字SRAM CIM。本电路采用booth编码方案设计，采用基于累加器的乘法累加(MAC)计算电路结构。它利用SRAM位线(BL)计算获得矩阵稀疏信息，并采用分配器为SRAM- cim分配数据计算。该设计在40纳米CMOS工艺下实现并进行了评估。我们的评估结果表明，所提出的电路在1.1 V时可以实现1 GHz的时钟频率，峰值性能为819.2 GOPS，并且在50%-90%稀疏度的情况下，SRAM-CIM比密集模式实现了1.12倍3.32倍的加速，节能48.2%至90.57%。当以90%的稀疏度执行8位矩阵乘法时，能量效率为10.57 TOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量