Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

Gokul Krishnan, Yufei Ma, Yu Cao
{"title":"Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks","authors":"Gokul Krishnan, Yufei Ma, Yu Cao","doi":"10.1109/ICSICT49897.2020.9278024","DOIUrl":null,"url":null,"abstract":"DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.","PeriodicalId":6727,"journal":{"name":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","volume":"198 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSICT49897.2020.9278024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于小世界结构剪枝的高效FPGA深度神经网络推理
深度神经网络修剪方法通常在不利用固有图属性和硬件偏好的情况下修剪模型参数。因此,FPGA加速器可能不会直接从这种随机剪枝中获益,而是在索引和控制模块上增加了额外的成本。受观察到大脑和现实世界网络遵循小世界模型的启发,我们提出了一种基于图的渐进式结构修剪技术,该技术集成了小世界图中的局部聚类和全局稀疏性以及FPGA数据流中的数据局部性。该技术在训练前将DNN分层裁剪成稀疏图,同时遵循小世界特性和FPGA数据流偏好,如分组非零和零参数,从而跳过数据加载和相应的计算。然后针对给定的数据集训练修剪后的模型,并对其进行微调以达到最佳精度。我们对不同数据集的多个dnn进行了评估。它在CIFAR-10、CIFAR-100和SVHN数据集上分别实现了76%、84%和76%的最先进的稀疏率。此外,生成的稀疏DNN在不同DNN的输出固定FPGA架构中实现了高达4倍的吞吐量改进,并且硬件开销很小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Correlations between Static Noise Margin and Single-Event-Upset Hardness for SRAM Cells Design and Implementation of a Low-cost AES Coprocessor Based on eSTT-MRAM IP A Novel Ultra-thin-barrier AlGaN/GaN MIS-gated Hybrid Anode Diode Featuring Improved High-temperature Reverse Blocking Characteristic A Novel Self-Aligned Dopant-Segregated Schottky Tunnel-FET with Asymmetry Sidewall Based on Standard CMOS Technology Unijunction Transistor on Silicon-On-Insulator Substrate
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1