Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning

Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin
{"title":"Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning","authors":"Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin","doi":"10.23919/DATE48585.2020.9116287","DOIUrl":null,"url":null,"abstract":"Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Fine-grained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarse-grained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE48585.2020.9116287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Fine-grained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarse-grained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于设备上机器学习的深度神经网络的灵活群级修剪
网络修剪是一种很有前途的压缩技术,可以减少深度神经网络的计算量和内存访问成本。修剪技术分为两类:细粒度修剪和粗粒度修剪。细粒度修剪消除了无关紧要的单个连接,因此通常会产生不规则的网络。因此,很难减少模型的执行时间。诸如过滤器级和通道级技术之类的粗粒度修剪可以使网络对硬件友好。然而,它的准确性可能较低。在本文中,我们重点研究了在移动gpu上加速深度神经网络的组级剪枝方法,该方法在一组中剪枝几个相邻的权值,以减轻剪枝网络的不规则性,同时提供较高的准确性。虽然已经提出了几种组级修剪技术,但以前的技术选择在组大小一致的位置修剪权重组。在本文中,我们提出了一种更灵活的方法,称为未对齐组级修剪,以提高压缩模型的准确性。本文用动态规划方法找到了未对齐群体选择问题的最优解。我们的技术还生成平衡稀疏网络,以实现并行计算单元的负载平衡。实验表明,在95%的稀疏度下,在CIFAR-10上的ResNet-20网络上,2D未对齐组级剪枝比之前的2D对齐组级剪枝错误率降低了3.12%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
In-Memory Resistive RAM Implementation of Binarized Neural Networks for Medical Applications Towards Formal Verification of Optimized and Industrial Multipliers A 100KHz-1GHz Termination-dependent Human Body Communication Channel Measurement using Miniaturized Wearable Devices Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing PIM-Aligner: A Processing-in-MRAM Platform for Biological Sequence Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1