Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI:10.1145/3195970.3196012

Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang

{"title":"Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors","authors":"Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang","doi":"10.1145/3195970.3196012","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 56

Abstract

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

补偿-深度神经网络:基于量化误差补偿的高能效低精度深度神经网络

深度神经网络(dnn)在涉及图像、视频、文本和自然语言的许多人工智能(AI)任务中代表了最先进的技术。它们的普遍采用受到深度神经网络的高计算和存储要求的限制，特别是对于使用可穿戴设备和物联网设备的边缘能量受限的推理任务。缓解计算挑战的一个有希望的方法是使用低精度定点(<16位)表示实现dnn。然而，任何固定点(FxP)实现中固有的量化误差限制了比特宽度的选择，以保持应用级精度。之前的研究建议增加网络规模和/或重新训练深度神经网络，以尽量减少量化造成的损失，尽管收效甚微。作为上述方法的补充，我们提出了补偿深度神经网络，其中我们建议动态补偿由于执行过程中量化而引入的误差。为此，我们引入了一种新的不动点表示，即误差补偿不动点(FPEC)。FPEC中的位分为计算位和补偿位。计算位使用传统的FxP符号来表示低精度的数字。另一方面，补偿位(最多1或2位)明确地捕获表示中量化误差的估计(方向和幅度)。对于给定的单词长度，由于FPEC比FxP表示使用更少的计算位，因此我们在乘法累加(MAC)操作中实现了接近二次的能量改进。补偿位同时用于低开销的稀疏补偿方案来估计MAC操作期间累积的误差，然后将其添加到MAC输出中以最小化量化的影响。我们为7个流行的图像识别基准构建了补偿dnn，其中包含0.05 - 2050万个神经元和0.01 - 155亿个连接。基于14纳米技术的门级分析，与16位和8位FxP实现相比，我们分别实现了2.65 × -4.88 ×和1.13 × -1.7 ×的能量提升，同时保持了<0.5%的分类精度损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量