在极端边缘设备中实现混合精度量化神经网络

Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI:10.1145/3387902.3394038

Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi

{"title":"在极端边缘设备中实现混合精度量化神经网络","authors":"Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi","doi":"10.1145/3387902.3394038","DOIUrl":null,"url":null,"abstract":"The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Enabling mixed-precision quantized neural networks in extreme-edge devices\",\"authors\":\"Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi\",\"doi\":\"10.1145/3387902.3394038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.\",\"PeriodicalId\":155089,\"journal\":{\"name\":\"Proceedings of the 17th ACM International Conference on Computing Frontiers\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 17th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3387902.3394038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387902.3394038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

在高级微控制器上部署量化神经网络(QNN)需要优化软件来利用现代指令集架构(ISA)的数字信号处理(DSP)扩展。因此，最近的研究提出了针对qnn(从8位到2位)的优化库，如CMSIS-NN和PULP-NN。这项工作提出了对PULP-NN库的扩展，目标是加速混合精度深度神经网络，这是一种新兴的范例，能够显著缩小深度神经网络的内存占用，而精度损失可以忽略不计。该库由27个内核组成，每个内核用于输入特征映射、权重和输出特征映射精度的排列(考虑8位、4位和2位)，能够在基于RISC-V处理器的并行超低功耗(PULP)集群上高效地推断QNN，具有RV32IMCXpulpV2 ISA。提出的解决方案在8核GAP-8 PULP集群上进行基准测试，在8核上达到16 mac /周期的峰值性能，比STM32H7(由ARM Cortex M7处理器驱动)快21到25倍，能效提高15到21倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enabling mixed-precision quantized neural networks in extreme-edge devices

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 17th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量

期刊最新文献

A critical view on moving target defense and its analogies Deffe Management of container-based genetic algorithm workloads over cloud infrastructure Automaton-based methodology for implementing optimization constraints for quantum annealing An efficient object detection framework with modified dense connections for small objects optimizations