Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks

IF 1.9 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Frontiers in electronics Pub Date : 2022-08-12 DOI:10.3389/felec.2022.954661

Han Bao, Yi-Fan Qin, Jia Chen, Ling Yang, Jiancong Li, Houji Zhou, Yi Li, Xiangshui Miao

{"title":"Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks","authors":"Han Bao, Yi-Fan Qin, Jia Chen, Ling Yang, Jiancong Li, Houji Zhou, Yi Li, Xiangshui Miao","doi":"10.3389/felec.2022.954661","DOIUrl":null,"url":null,"abstract":"Nonvolatile memory (NVM)-based convolutional neural networks (NvCNNs) have received widespread attention as a promising solution for hardware edge intelligence. However, there still exist many challenges in the resource-constrained conditions, such as the limitations of the hardware precision and cost and, especially, the large overhead of the analog-to-digital converters (ADCs). In this study, we systematically analyze the performance of NvCNNs and the hardware restrictions with quantization in both weight and activation and propose the corresponding requirements of NVM devices and peripheral circuits for multiply–accumulate (MAC) units. In addition, we put forward an in situ sparsity-aware processing method that exploits the sparsity of the network and the device array characteristics to further improve the energy efficiency of quantized NvCNNs. Our results suggest that the 4-bit-weight and 3-bit-activation (W4A3) design demonstrates the optimal compromise between the network performance and hardware overhead, achieving 98.82% accuracy for the Modified National Institute of Standards and Technology database (MNIST) classification task. Moreover, higher-precision designs will claim more restrictive requirements for hardware nonidealities including the variations of NVM devices and the nonlinearities of the converters. Moreover, the sparsity-aware processing method can obtain 79%/53% ADC energy reduction and 2.98×/1.15× energy efficiency improvement based on the W8A8/W4A3 quantization design with an array size of 128 × 128.","PeriodicalId":73081,"journal":{"name":"Frontiers in electronics","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/felec.2022.954661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

Abstract

Nonvolatile memory (NVM)-based convolutional neural networks (NvCNNs) have received widespread attention as a promising solution for hardware edge intelligence. However, there still exist many challenges in the resource-constrained conditions, such as the limitations of the hardware precision and cost and, especially, the large overhead of the analog-to-digital converters (ADCs). In this study, we systematically analyze the performance of NvCNNs and the hardware restrictions with quantization in both weight and activation and propose the corresponding requirements of NVM devices and peripheral circuits for multiply–accumulate (MAC) units. In addition, we put forward an in situ sparsity-aware processing method that exploits the sparsity of the network and the device array characteristics to further improve the energy efficiency of quantized NvCNNs. Our results suggest that the 4-bit-weight and 3-bit-activation (W4A3) design demonstrates the optimal compromise between the network performance and hardware overhead, achieving 98.82% accuracy for the Modified National Institute of Standards and Technology database (MNIST) classification task. Moreover, higher-precision designs will claim more restrictive requirements for hardware nonidealities including the variations of NVM devices and the nonlinearities of the converters. Moreover, the sparsity-aware processing method can obtain 79%/53% ADC energy reduction and 2.98×/1.15× energy efficiency improvement based on the W8A8/W4A3 quantization design with an array size of 128 × 128.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于nvm的高能效卷积神经网络的量化和稀疏感知处理

基于非易失性存储器(NVM)的卷积神经网络(nvcnn)作为一种有前途的硬件边缘智能解决方案受到了广泛关注。然而，在资源有限的条件下，仍然存在许多挑战，例如硬件精度和成本的限制，特别是模数转换器(adc)的巨大开销。在本研究中，我们系统地分析了nvcnn的性能以及量化权重和激活的硬件限制，并提出了相应的NVM设备和外围电路对乘法累加(MAC)单元的要求。此外，我们提出了一种原位稀疏感知处理方法，利用网络的稀疏性和设备阵列特性，进一步提高量化nvcnn的能量效率。我们的结果表明，4位权重和3位激活(W4A3)设计展示了网络性能和硬件开销之间的最佳折衷，在修改的国家标准与技术研究所数据库(MNIST)分类任务中实现了98.82%的准确率。此外，更高精度的设计将对硬件非理想性提出更严格的要求，包括NVM器件的变化和转换器的非线性。此外，稀疏感知处理方法在阵列尺寸为128 × 128的W8A8/W4A3量化设计基础上，可获得79%/53%的ADC能量降低和2.98×/1.15×的能效提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in electronics

自引率

0.00%

发文量