Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT) Pub Date : 2020-11-03 DOI:10.1109/ICSICT49897.2020.9278024

Gokul Krishnan, Yufei Ma, Yu Cao

{"title":"Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks","authors":"Gokul Krishnan, Yufei Ma, Yu Cao","doi":"10.1109/ICSICT49897.2020.9278024","DOIUrl":null,"url":null,"abstract":"DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.","PeriodicalId":6727,"journal":{"name":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","volume":"198 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSICT49897.2020.9278024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于小世界结构剪枝的高效FPGA深度神经网络推理

深度神经网络修剪方法通常在不利用固有图属性和硬件偏好的情况下修剪模型参数。因此，FPGA加速器可能不会直接从这种随机剪枝中获益，而是在索引和控制模块上增加了额外的成本。受观察到大脑和现实世界网络遵循小世界模型的启发，我们提出了一种基于图的渐进式结构修剪技术，该技术集成了小世界图中的局部聚类和全局稀疏性以及FPGA数据流中的数据局部性。该技术在训练前将DNN分层裁剪成稀疏图，同时遵循小世界特性和FPGA数据流偏好，如分组非零和零参数，从而跳过数据加载和相应的计算。然后针对给定的数据集训练修剪后的模型，并对其进行微调以达到最佳精度。我们对不同数据集的多个dnn进行了评估。它在CIFAR-10、CIFAR-100和SVHN数据集上分别实现了76%、84%和76%的最先进的稀疏率。此外，生成的稀疏DNN在不同DNN的输出固定FPGA架构中实现了高达4倍的吞吐量改进，并且硬件开销很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)

自引率

0.00%

发文量