SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-03-02 DOI:10.1109/ICCD53106.2021.00072

Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

{"title":"SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network","authors":"Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1109/ICCD53106.2021.00072","DOIUrl":null,"url":null,"abstract":"Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于rram的稀疏乘法引擎压缩神经网络的位稀疏性

电阻随机存取存储器(ReRAM)交叉棒是一种很有前途的深度神经网络(DNN)加速器技术，由于其在内存和原位模拟计算向量矩阵乘法和累积(vmm)的能力。然而，crossbar架构很难利用深度神经网络的稀疏性。由于紧耦合的横杆结构的限制，利用细粒度稀疏性的控制不可避免地会造成复杂和昂贵的控制。作为对策，我们基于硬件和软件协同设计框架，开发了一种新的基于rerram的深度神经网络加速器，命名为稀疏乘法引擎(SME)。首先，我们在现有量化方法的基础上编排了位稀疏模式以增加位稀疏密度。其次，我们提出了一种新的权值映射机制，将权值的比特分割到横条上，并将激活结果拼接到外围电路中。该机制可以解耦紧耦合的横杆结构，并在横杆中积累稀疏性。最后，一种优越的挤出方案清空前两步中由高度稀疏的非零映射的交叉条。我们设计了SME架构，并讨论了它在其他量化方法和不同的ReRAM单元技术中的应用。与之前最先进的设计相比，SME在使用ResNet-50和MobileNet-v2时将横梁的使用分别减少了8.7倍和2.1倍，在使用ImageNet时精度下降≤0.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 39th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量