Tight Compression: Compressing CNN Model Tightly Through Unstructured Pruning and Simulated Annealing Based Permutation

2020 57th ACM/IEEE Design Automation Conference (DAC) Pub Date : 2020-07-01 DOI:10.1109/DAC18072.2020.9218701

Xizi Chen, Jingyang Zhu, Jingbo Jiang, C. Tsui

引用次数: 11

Abstract

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. The coarse-grained structured pruning, on the other hand, tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a compression method based on the unstructured pruning and a novel weight permutation scheme. Through permutation, the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Compared to the state-of-the-art works, the matrix compression rate is effectively improved from 5.88x to 10.28x. As a result, the throughput and energy efficiency are improved by 2.12 and 1.57 times, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

紧压缩:通过非结构化剪枝和基于模拟退火的置换对CNN模型进行紧压缩

修剪后的非结构化稀疏性对现有常规架构(如收缩数组)中深度学习模型的有效实现提出了挑战。而粗粒度的结构化剪枝，在剪枝模型大小相同的情况下，往往比非结构化剪枝具有更高的精度损失。在这项工作中，我们提出了一种基于非结构化剪枝和一种新的权重排列方案的压缩方法。通过置换，将稀疏权矩阵进一步压缩为小而密的格式，充分利用硬件资源。与最先进的作品相比，矩阵压缩率从5.88倍有效提高到10.28倍。因此，吞吐量和能源效率分别提高了2.12倍和1.57倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 57th ACM/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量