Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI:10.1109/AICAS57966.2023.10168577

Georg Rutishauser, Francesco Conti, L. Benini

{"title":"Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge","authors":"Georg Rutishauser, Francesco Conti, L. Benini","doi":"10.1109/AICAS57966.2023.10168577","DOIUrl":null,"url":null,"abstract":"Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in-tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Mixed-precision quantization, where a deep neural network’s layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the in-tractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6 % reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自由位:边缘混合精度量化神经网络的延迟优化

混合精度量化，其中深度神经网络的层被量化到不同的精度，提供了优化模型大小、延迟和统计精度之间的权衡的机会，而不是均匀位宽量化所能实现的。针对给定网络中混合精度配置难以处理的搜索空间，提出了一种混合搜索方法。它包括一个硬件不可知的可微搜索算法，然后是一个硬件感知的启发式优化，以找到针对特定硬件目标优化的混合精度配置延迟。我们在MobileNetV1和MobileNetV2上评估了我们的算法，并将得到的网络部署在具有不同硬件特性的多核RISC-V微控制器平台上。与8位模型相比，我们实现了高达28.6%的端到端延迟减少，而与1000类ImageNet数据集上的全精度基线相比，准确度下降可以忽略不计。我们演示了相对于8位基准的加速，即使在没有硬件支持子字节算术的系统上，精度下降也可以忽略不计。此外，我们还展示了我们的方法在针对减少二进制操作计数作为延迟代理的可微搜索方面的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量