基于核控制DQN的CNN剪枝模型压缩与加速

Proceedings of the International Conference on Research in Adaptive and Convergent Systems Pub Date : 2020-10-13 DOI:10.1145/3400286.3418258

Romancha Khatri, Kwanghee Won

{"title":"基于核控制DQN的CNN剪枝模型压缩与加速","authors":"Romancha Khatri, Kwanghee Won","doi":"10.1145/3400286.3418258","DOIUrl":null,"url":null,"abstract":"Apart from the accuracy, the size of Convolutional Neural Networks (CNN) model is another principal factor for facilitating the deployment of models on memory, power and budget constrained devices. Conventional compression techniques require human expert to setup parameters to explore the design space and iterative based pruning requires heavy training which is sub-optimal and time consuming. Given a CNN model, we propose deep reinforcement learning [8] DQN based automated compression which effectively turned off kernels on each layer by observing its significance. Observing accuracy, compression ratio and convergence rate, proposed DQN model can automatically re- activate the healthiest kernels back to train it again to regain accuracy which greatly ameliorate the model compression quality. Based on experiments on MNIST [3] dataset, our method can compress convolution layers for VGG-like [10] model up to 60% with 0.5% increase in test accuracy within less than a half the number of initial amount of training (speed-up up to 2.5×), state- of-the-art results of dropping 80% of kernels (compressed 86% parameters) with increase in accuracy by 0.14%. Further dropping 84% of kernels (compressed 94% parameters) with the loss of 0.4% accuracy. The first proposed Auto-AEC (Accuracy-Ensured Compression) model can compress the network by preserving original accuracy or increase in accuracy of the model, whereas, the second proposed Auto-CECA (Compression-Ensured Considering the Accuracy) model can compress to the maximum by preserving original accuracy or minimal drop of accuracy. We further analyze effectiveness of kernels on different layers based on how our model explores and exploits in various stages of training.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"186 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Kernel-controlled DQN based CNN Pruning for Model Compression and Acceleration\",\"authors\":\"Romancha Khatri, Kwanghee Won\",\"doi\":\"10.1145/3400286.3418258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apart from the accuracy, the size of Convolutional Neural Networks (CNN) model is another principal factor for facilitating the deployment of models on memory, power and budget constrained devices. Conventional compression techniques require human expert to setup parameters to explore the design space and iterative based pruning requires heavy training which is sub-optimal and time consuming. Given a CNN model, we propose deep reinforcement learning [8] DQN based automated compression which effectively turned off kernels on each layer by observing its significance. Observing accuracy, compression ratio and convergence rate, proposed DQN model can automatically re- activate the healthiest kernels back to train it again to regain accuracy which greatly ameliorate the model compression quality. Based on experiments on MNIST [3] dataset, our method can compress convolution layers for VGG-like [10] model up to 60% with 0.5% increase in test accuracy within less than a half the number of initial amount of training (speed-up up to 2.5×), state- of-the-art results of dropping 80% of kernels (compressed 86% parameters) with increase in accuracy by 0.14%. Further dropping 84% of kernels (compressed 94% parameters) with the loss of 0.4% accuracy. The first proposed Auto-AEC (Accuracy-Ensured Compression) model can compress the network by preserving original accuracy or increase in accuracy of the model, whereas, the second proposed Auto-CECA (Compression-Ensured Considering the Accuracy) model can compress to the maximum by preserving original accuracy or minimal drop of accuracy. We further analyze effectiveness of kernels on different layers based on how our model explores and exploits in various stages of training.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"186 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

除了精度之外，卷积神经网络(CNN)模型的大小是促进模型在内存，功率和预算受限的设备上部署的另一个主要因素。传统的压缩技术需要人类专家设置参数来探索设计空间，而基于迭代的剪枝需要大量的训练，这是次优的且耗时的。给定一个CNN模型，我们提出了基于深度强化学习[8]DQN的自动压缩，通过观察每一层的显著性，有效地关闭了每一层的核。通过观察准确率、压缩率和收敛率，该DQN模型可以自动重新激活最健康的核重新训练，从而获得精度，大大改善了模型的压缩质量。基于MNIST[3]数据集的实验，我们的方法可以在不到初始训练量的一半的时间内将vgg -类[10]模型的卷积层压缩到60%，测试精度提高0.5%(加速高达2.5倍)，最先进的结果是减少80%的核(压缩86%的参数)，精度提高0.14%。进一步减少84%的核(压缩94%的参数)，损失0.4%的精度。第一种Auto-AEC (accuracy - assured Compression)模型可以通过保持原始精度或提高模型精度来压缩网络，而第二种Auto-CECA (Compression- assured Considering The accuracy)模型可以通过保持原始精度或降低精度来最大限度地压缩网络。基于模型在不同训练阶段的探索和利用方式，我们进一步分析了不同层上核的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Kernel-controlled DQN based CNN Pruning for Model Compression and Acceleration

Apart from the accuracy, the size of Convolutional Neural Networks (CNN) model is another principal factor for facilitating the deployment of models on memory, power and budget constrained devices. Conventional compression techniques require human expert to setup parameters to explore the design space and iterative based pruning requires heavy training which is sub-optimal and time consuming. Given a CNN model, we propose deep reinforcement learning [8] DQN based automated compression which effectively turned off kernels on each layer by observing its significance. Observing accuracy, compression ratio and convergence rate, proposed DQN model can automatically re- activate the healthiest kernels back to train it again to regain accuracy which greatly ameliorate the model compression quality. Based on experiments on MNIST [3] dataset, our method can compress convolution layers for VGG-like [10] model up to 60% with 0.5% increase in test accuracy within less than a half the number of initial amount of training (speed-up up to 2.5×), state- of-the-art results of dropping 80% of kernels (compressed 86% parameters) with increase in accuracy by 0.14%. Further dropping 84% of kernels (compressed 94% parameters) with the loss of 0.4% accuracy. The first proposed Auto-AEC (Accuracy-Ensured Compression) model can compress the network by preserving original accuracy or increase in accuracy of the model, whereas, the second proposed Auto-CECA (Compression-Ensured Considering the Accuracy) model can compress to the maximum by preserving original accuracy or minimal drop of accuracy. We further analyze effectiveness of kernels on different layers based on how our model explores and exploits in various stages of training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助