Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors

Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang
{"title":"Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors","authors":"Shubham Jain, Swagath Venkataramani, V. Srinivasan, Jungwook Choi, P. Chuang, Le Chang","doi":"10.1145/3195970.3196012","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 56

Abstract

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, especially for energy-constrained inference tasks at the edge using wearable and IoT devices. One promising approach to alleviate the computational challenges is implementing DNNs using low-precision fixed point (<16 bits) representation. However, the quantization error inherent in any Fixed Point (FxP) implementation limits the choice of bit-widths to maintain application-level accuracy. Prior efforts recommend increasing the network size and/or re-training the DNN to minimize loss due to quantization, albeit with limited success.Complementary to the above approaches, we present Compensated-DNN, wherein we propose to dynamically compensate the error introduced due to quantization during execution. To this end, we introduce a new fixed-point representation viz. Fixed Point with Error Compensation (FPEC). The bits in FPEC are split between computation bits vs. compensation bits. The computation bits use conventional FxP notation to represent the number at low-precision. On the other hand, the compensation bits (1 or 2 bits at most) explicitly capture an estimate (direction and magnitude) of the quantization error in the representation. For a given word length, since FPEC uses fewer computation bits compared to FxP representation, we achieve a near-quadratic improvement in energy in the multiply-and-accumulate (MAC) operations. The compensation bits are simultaneously used by a low-overhead sparse compensation scheme to estimate the error accrued during MAC operations, which is then added to the MAC output to minimize the impact of quantization. We build compensated-DNNs for 7 popular image recognition benchmarks with 0.05–20.5 million neurons and 0.01–15.5 billion connections. Based on gate-level analysis at 14nm technology, we achieve 2.65 × –4.88 × and 1.13 × –1.7 × improvement in energy compared to 16-bit and 8-bit FxP implementations respectively, while maintaining <0.5% loss in classification accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
补偿-深度神经网络:基于量化误差补偿的高能效低精度深度神经网络
深度神经网络(dnn)在涉及图像、视频、文本和自然语言的许多人工智能(AI)任务中代表了最先进的技术。它们的普遍采用受到深度神经网络的高计算和存储要求的限制,特别是对于使用可穿戴设备和物联网设备的边缘能量受限的推理任务。缓解计算挑战的一个有希望的方法是使用低精度定点(<16位)表示实现dnn。然而,任何固定点(FxP)实现中固有的量化误差限制了比特宽度的选择,以保持应用级精度。之前的研究建议增加网络规模和/或重新训练深度神经网络,以尽量减少量化造成的损失,尽管收效甚微。作为上述方法的补充,我们提出了补偿深度神经网络,其中我们建议动态补偿由于执行过程中量化而引入的误差。为此,我们引入了一种新的不动点表示,即误差补偿不动点(FPEC)。FPEC中的位分为计算位和补偿位。计算位使用传统的FxP符号来表示低精度的数字。另一方面,补偿位(最多1或2位)明确地捕获表示中量化误差的估计(方向和幅度)。对于给定的单词长度,由于FPEC比FxP表示使用更少的计算位,因此我们在乘法累加(MAC)操作中实现了接近二次的能量改进。补偿位同时用于低开销的稀疏补偿方案来估计MAC操作期间累积的误差,然后将其添加到MAC输出中以最小化量化的影响。我们为7个流行的图像识别基准构建了补偿dnn,其中包含0.05 - 2050万个神经元和0.01 - 155亿个连接。基于14纳米技术的门级分析,与16位和8位FxP实现相比,我们分别实现了2.65 × -4.88 ×和1.13 × -1.7 ×的能量提升,同时保持了<0.5%的分类精度损失。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Soft-FET: Phase transition material assisted Soft switching F ield E ffect T ransistor for supply voltage droop mitigation Modelling Multicore Contention on the AURIX™ TC27x Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks* Generalized Augmented Lagrangian and Its Applications to VLSI Global Placement* Side-channel security of superscalar CPUs : Evaluating the Impact of Micro-architectural Features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1