Flexible group-level pruning of deep neural networks for fast inference on mobile CPUs: work-in-progress

Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion Pub Date : 2019-10-13 DOI:10.1145/3349569.3351537

Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin

引用次数: 1

Abstract

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

移动cpu上用于快速推理的深度神经网络的灵活组级修剪:正在研究中

网络修剪是一种很有前途的压缩技术，可以减少深度神经网络的计算量和内存访问成本。在本文中，我们提出了一种新的组级修剪方法来加速移动gpu上的深度神经网络，该方法在提供高精度的同时在一组中修剪多个相邻的权值。虽然已经提出了几种组级剪枝技术，但以往的技术在高稀疏度下无法达到预期的精度。在本文中，我们提出了一种不对齐的方法来提高压缩模型的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion

自引率

0.00%

发文量