演化的位稀疏模式用于量化深度神经网络的硬件友好推理

Proceedings of the 59th ACM/IEEE Design Automation Conference Pub Date : 2022-07-10 DOI:10.1145/3489517.3530660

Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

{"title":"演化的位稀疏模式用于量化深度神经网络的硬件友好推理","authors":"Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1145/3489517.3530660","DOIUrl":null,"url":null,"abstract":"Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks\",\"authors\":\"Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang\",\"doi\":\"10.1145/3489517.3530660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.\",\"PeriodicalId\":373005,\"journal\":{\"name\":\"Proceedings of the 59th ACM/IEEE Design Automation Conference\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 59th ACM/IEEE Design Automation Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3489517.3530660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 59th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489517.3530660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

由于模型大小和计算量巨大，为了支持边缘计算平台上高效的神经网络推理，模型压缩已经得到了广泛的研究。最近的研究涵盖了多种极端压缩技术的联合压缩。然而，大多数联合方法采用一种简单的解决方案，顺序地应用两种方法，这可能是次优的，因为它缺乏一种系统的方法来合并它们。本文提出了将主动联合压缩集成到硬件设计中，即EBSP。其动机是:1)量化允许简化硬件实现;2)量化权重的位分布可以看作是一个独立的可训练变量;3)在量化网络中利用比特稀疏有可能获得更好的性能。为了实现这一目标，本文引入了比特稀疏模式，以在量化网络中构建具有高表达性和固有规则的比特分布。我们进一步将稀疏性约束纳入到训练中，将固有的位分布演化为位稀疏模式。此外，所引入的位稀疏模式的结构在竞争性分类精度下实现了最小的硬件实现。具体来说，受位稀疏模式约束的量化网络可以在最小修改的计算硬件中使用具有最少条目的lut而不是乘法器来处理。我们的实验表明，与Eyeriss、BitFusion、WAX和OLAccel相比，EBSP在精度损失小于0.8%的情况下，平均能量降低了87.3%、79.7%、75.2%和58.9%，性能提高了93.8%、83.7%、72.7%和49.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks

Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助