{"title":"演化的位稀疏模式用于量化深度神经网络的硬件友好推理","authors":"Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1145/3489517.3530660","DOIUrl":null,"url":null,"abstract":"Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks\",\"authors\":\"Fangxin Liu, Wenbo Zhao, Zongwu Wang, Yongbiao Chen, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang\",\"doi\":\"10.1145/3489517.3530660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.\",\"PeriodicalId\":373005,\"journal\":{\"name\":\"Proceedings of the 59th ACM/IEEE Design Automation Conference\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 59th ACM/IEEE Design Automation Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3489517.3530660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 59th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489517.3530660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks
Model compression has been extensively investigated for supporting efficient neural network inference on edge-computing platforms due to the huge model size and computation amount. Recent researches embrace joint-way compression across multiple techniques for extreme compression. However, most joint-way methods adopt a naive solution that applies two approaches sequentially, which can be sub-optimal, as it lacks a systematic approach to incorporate them. This paper proposes the integration of aggressive joint-way compression into hardware design, namely EBSP. It is motivated by 1) the quantization allows simplifying hardware implementations; 2) the bit distribution of quantized weights can be viewed as an independent trainable variable; 3) the exploitation of bit sparsity in the quantized network has the potential to achieve better performance. To achieve that, this paper introduces the bit sparsity patterns to construct both highly expressive and inherently regular bit distribution in the quantized network. We further incorporate our sparsity constraint in training to evolve inherently bit distributions to the bit sparsity pattern. Moreover, the structure of the introduced bit sparsity pattern engenders minimum hardware implementation under competitive classification accuracy. Specifically, the quantized network constrained by bit sparsity pattern can be processed using LUTs with the fewest entries instead of multipliers in minimally modified computational hardware. Our experiments show that compared to Eyeriss, BitFusion, WAX, and OLAccel, EBSP with less than 0.8% accuracy loss, can achieve 87.3%, 79.7%, 75.2% and 58.9% energy reduction and 93.8%, 83.7%, 72.7% and 49.5% performance gain on average, respectively.