Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang
{"title":"基于rram的稀疏乘法引擎压缩神经网络的位稀疏性","authors":"Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1109/ICCD53106.2021.00072","DOIUrl":null,"url":null,"abstract":"Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network\",\"authors\":\"Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang\",\"doi\":\"10.1109/ICCD53106.2021.00072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.\",\"PeriodicalId\":154014,\"journal\":{\"name\":\"2021 IEEE 39th International Conference on Computer Design (ICCD)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 39th International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD53106.2021.00072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network
Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.