Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI:10.1145/3508352.3549368

Keqi Fu, Zhi Qi, Jiaxuan Cai, Xulong Shi

{"title":"Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning","authors":"Keqi Fu, Zhi Qi, Jiaxuan Cai, Xulong Shi","doi":"10.1145/3508352.3549368","DOIUrl":null,"url":null,"abstract":"As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于结构化细粒度剪枝的FPGA实现高性能、精确的BNN推断

作为量化网络的极端情况，二进制神经网络(BNNs)由于其在存储和计算方面具有许多硬件友好的特性而受到了极大的关注。为了达到紧凑模型的极限，我们尝试将二值化与剪枝技术相结合，进一步探索bnn的冗余性。然而，粗粒度的修剪方法可能会导致服务器精度下降，而传统的细粒度修剪方法则会导致硬件难以利用的不规则稀疏性。本文从算法和硬件的角度出发，提出了结构化通道核修剪和动态空间修剪两种先进的细粒度BNN修剪模块。修剪后的BNN模型是从头开始训练的，不仅具有更高的精度，而且具有高度的并行性。然后，我们开发了一个加速器架构，可以有效地利用我们的算法引起的稀疏性。最后，我们在嵌入式FPGA (Ultra96v2)上实现了修剪后的BNN模型。结果表明，我们的软硬件协同设计比基线BNN的推理速度提高了5.4倍，与现有FPGA实现的BNN工作相比，具有更高的资源和能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

自引率

0.00%

发文量