Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning

Keqi Fu, Zhi Qi, Jiaxuan Cai, Xulong Shi
{"title":"Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning","authors":"Keqi Fu, Zhi Qi, Jiaxuan Cai, Xulong Shi","doi":"10.1145/3508352.3549368","DOIUrl":null,"url":null,"abstract":"As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于结构化细粒度剪枝的FPGA实现高性能、精确的BNN推断
作为量化网络的极端情况,二进制神经网络(BNNs)由于其在存储和计算方面具有许多硬件友好的特性而受到了极大的关注。为了达到紧凑模型的极限,我们尝试将二值化与剪枝技术相结合,进一步探索bnn的冗余性。然而,粗粒度的修剪方法可能会导致服务器精度下降,而传统的细粒度修剪方法则会导致硬件难以利用的不规则稀疏性。本文从算法和硬件的角度出发,提出了结构化通道核修剪和动态空间修剪两种先进的细粒度BNN修剪模块。修剪后的BNN模型是从头开始训练的,不仅具有更高的精度,而且具有高度的并行性。然后,我们开发了一个加速器架构,可以有效地利用我们的算法引起的稀疏性。最后,我们在嵌入式FPGA (Ultra96v2)上实现了修剪后的BNN模型。结果表明,我们的软硬件协同设计比基线BNN的推理速度提高了5.4倍,与现有FPGA实现的BNN工作相比,具有更高的资源和能源效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications Numerically-Stable and Highly-Scalable Parallel LU Factorization for Circuit Simulation Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering Design and Technology Co-optimization Utilizing Multi-bit Flip-flop Cells
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1