Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2020-03-01 DOI:10.23919/DATE48585.2020.9116287

Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin

{"title":"Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning","authors":"Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin","doi":"10.23919/DATE48585.2020.9116287","DOIUrl":null,"url":null,"abstract":"Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Fine-grained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarse-grained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE48585.2020.9116287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Fine-grained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarse-grained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于设备上机器学习的深度神经网络的灵活群级修剪

网络修剪是一种很有前途的压缩技术，可以减少深度神经网络的计算量和内存访问成本。修剪技术分为两类:细粒度修剪和粗粒度修剪。细粒度修剪消除了无关紧要的单个连接，因此通常会产生不规则的网络。因此，很难减少模型的执行时间。诸如过滤器级和通道级技术之类的粗粒度修剪可以使网络对硬件友好。然而，它的准确性可能较低。在本文中，我们重点研究了在移动gpu上加速深度神经网络的组级剪枝方法，该方法在一组中剪枝几个相邻的权值，以减轻剪枝网络的不规则性，同时提供较高的准确性。虽然已经提出了几种组级修剪技术，但以前的技术选择在组大小一致的位置修剪权重组。在本文中，我们提出了一种更灵活的方法，称为未对齐组级修剪，以提高压缩模型的准确性。本文用动态规划方法找到了未对齐群体选择问题的最优解。我们的技术还生成平衡稀疏网络，以实现并行计算单元的负载平衡。实验表明，在95%的稀疏度下，在CIFAR-10上的ResNet-20网络上，2D未对齐组级剪枝比之前的2D对齐组级剪枝错误率降低了3.12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量