Sound Mixed Fixed-Point Quantization of Neural Networks

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2023-09-09 DOI:10.1145/3609118

Debasmita Lohar, Clothilde Jeangoudoux, Anastasia Volkova, Eva Darulova

{"title":"Sound Mixed Fixed-Point Quantization of Neural Networks","authors":"Debasmita Lohar, Clothilde Jeangoudoux, Anastasia Volkova, Eva Darulova","doi":"10.1145/3609118","DOIUrl":null,"url":null,"abstract":"Neural networks are increasingly being used as components in safety-critical applications, for instance, as controllers in embedded systems. Their formal safety verification has made significant progress but typically considers only idealized real-valued networks. For practical applications, such neural networks have to be quantized, i.e., implemented in finite-precision arithmetic, which inevitably introduces roundoff errors. Choosing a suitable precision that is both guaranteed to satisfy a roundoff error bound to ensure safety and that is as small as possible to not waste resources is highly nontrivial to do manually. This task is especially challenging when quantizing a neural network in fixed-point arithmetic, where one can choose among a large number of precisions and has to ensure overflow-freedom explicitly. This paper presents the first sound and fully automated mixed-precision quantization approach that specifically targets deep feed-forward neural networks. Our quantization is based on mixed-integer linear programming (MILP) and leverages the unique structure of neural networks and effective over-approximations to make MILP optimization feasible. Our approach efficiently optimizes the number of bits needed to implement a network while guaranteeing a provided error bound. Our evaluation on existing embedded neural controller benchmarks shows that our optimization translates into precision assignments that mostly use fewer machine cycles when compiled to an FPGA with a commercial HLS compiler than code generated by (sound) state-of-the-art. Furthermore, our approach handles significantly more benchmarks substantially faster, especially for larger networks.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"51 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609118","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Neural networks are increasingly being used as components in safety-critical applications, for instance, as controllers in embedded systems. Their formal safety verification has made significant progress but typically considers only idealized real-valued networks. For practical applications, such neural networks have to be quantized, i.e., implemented in finite-precision arithmetic, which inevitably introduces roundoff errors. Choosing a suitable precision that is both guaranteed to satisfy a roundoff error bound to ensure safety and that is as small as possible to not waste resources is highly nontrivial to do manually. This task is especially challenging when quantizing a neural network in fixed-point arithmetic, where one can choose among a large number of precisions and has to ensure overflow-freedom explicitly. This paper presents the first sound and fully automated mixed-precision quantization approach that specifically targets deep feed-forward neural networks. Our quantization is based on mixed-integer linear programming (MILP) and leverages the unique structure of neural networks and effective over-approximations to make MILP optimization feasible. Our approach efficiently optimizes the number of bits needed to implement a network while guaranteeing a provided error bound. Our evaluation on existing embedded neural controller benchmarks shows that our optimization translates into precision assignments that mostly use fewer machine cycles when compiled to an FPGA with a commercial HLS compiler than code generated by (sound) state-of-the-art. Furthermore, our approach handles significantly more benchmarks substantially faster, especially for larger networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

神经网络的声音混合不动点量化

神经网络越来越多地被用作安全关键应用的组件，例如，作为嵌入式系统中的控制器。他们的正式安全验证取得了重大进展，但通常只考虑理想化的实值网络。在实际应用中，这种神经网络必须被量化，即在有限精度算法中实现，这不可避免地引入了舍入误差。选择一个合适的精度，既要保证满足舍入误差界限以确保安全，又要尽可能小以避免浪费资源，这是手工完成的非常重要的工作。当使用定点算法对神经网络进行量化时，这一任务尤其具有挑战性，因为人们可以在大量精度中进行选择，并且必须明确地确保溢出自由。本文提出了第一个健全的全自动混合精度量化方法，专门针对深度前馈神经网络。我们的量化基于混合整数线性规划(MILP)，并利用神经网络的独特结构和有效的过逼近使MILP优化可行。我们的方法有效地优化了实现网络所需的比特数，同时保证了提供的错误界限。我们对现有嵌入式神经控制器基准的评估表明，当使用商业HLS编译器编译到FPGA时，我们的优化转化为精度分配，与使用(声音)最先进的代码生成的代码相比，大多数情况下使用更少的机器周期。此外，我们的方法处理更多基准测试的速度明显更快，特别是对于较大的网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.