AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing Pub Date : 2021-06-03 DOI:10.1145/3447818.3460366

Siling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun

{"title":"AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator","authors":"Siling Yang, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin, Xian-He Sun","doi":"10.1145/3447818.3460366","DOIUrl":null,"url":null,"abstract":"Emergent ReRAM-based accelerators support in-memory computation to accelerate deep neural network (DNN) inference. Weight matrix pruning of DNNs is a widely used technique to reduce the size of DNN models, thereby reducing the resource and energy consumption of ReRAM-based accelerators. However, conventional works on weight matrix pruning for ReRAM-based accelerators have three major issues. First, they use heuristics or rules from domain experts to prune the weights, leading to suboptimal pruning policies. Second, they mostly focus on improving compression ratio, thus may not meet accuracy constraints. Third, they ignore direct feedback of hardware. In this paper, we introduce an automated DNN pruning and mapping framework, named AUTO-PRUNE. It leverages reinforcement learning (RL) to automatically determine the pruning policy considering the constraint of accuracy loss. The reward function of RL agents is designed using hardware’s direct feedback (i.e., accuracy and compression rate of occupied crossbars). The function directs the search of the pruning ratio of each layer for a global optimum considering the characteristics of individual layers of DNN models. Then AUTO-PRUNE maps the pruned weight matrices to crossbars to store only nontrivial elements. Finally, to avoid the dislocation problem, we design a new data-path in ReRAM-based accelerators to correctly index and feed input to matrix-vector computation leveraging the mechanism of operation units. Experimental results show that, compared to the state-of-the-art work, AUTO-PRUNE achieves up to 3.3X compression rate, 3.1X area efficiency, and 3.3X energy efficiency with a similar or even higher accuracy.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3460366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Emergent ReRAM-based accelerators support in-memory computation to accelerate deep neural network (DNN) inference. Weight matrix pruning of DNNs is a widely used technique to reduce the size of DNN models, thereby reducing the resource and energy consumption of ReRAM-based accelerators. However, conventional works on weight matrix pruning for ReRAM-based accelerators have three major issues. First, they use heuristics or rules from domain experts to prune the weights, leading to suboptimal pruning policies. Second, they mostly focus on improving compression ratio, thus may not meet accuracy constraints. Third, they ignore direct feedback of hardware. In this paper, we introduce an automated DNN pruning and mapping framework, named AUTO-PRUNE. It leverages reinforcement learning (RL) to automatically determine the pruning policy considering the constraint of accuracy loss. The reward function of RL agents is designed using hardware’s direct feedback (i.e., accuracy and compression rate of occupied crossbars). The function directs the search of the pruning ratio of each layer for a global optimum considering the characteristics of individual layers of DNN models. Then AUTO-PRUNE maps the pruned weight matrices to crossbars to store only nontrivial elements. Finally, to avoid the dislocation problem, we design a new data-path in ReRAM-based accelerators to correctly index and feed input to matrix-vector computation leveraging the mechanism of operation units. Experimental results show that, compared to the state-of-the-art work, AUTO-PRUNE achieves up to 3.3X compression rate, 3.1X area efficiency, and 3.3X energy efficiency with a similar or even higher accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AUTO-PRUNE:基于reram的加速器的自动DNN修剪和映射

紧急基于reram的加速器支持内存计算，以加速深度神经网络(DNN)的推理。DNN的权矩阵剪枝是一种广泛使用的技术，可以减小DNN模型的尺寸，从而减少基于reram的加速器的资源和能量消耗。然而，传统的基于reram的加速器权矩阵修剪工作存在三个主要问题。首先，他们使用启发式或领域专家的规则来修剪权重，导致次优修剪策略。其次，它们主要关注于提高压缩比，因此可能不满足精度约束。第三，他们忽略了硬件的直接反馈。在本文中，我们介绍了一个自动DNN修剪和映射框架，称为AUTO-PRUNE。它利用强化学习(RL)在考虑精度损失约束的情况下自动确定剪枝策略。RL agent的奖励函数是利用硬件的直接反馈(即占用横梁的准确率和压缩率)来设计的。考虑到DNN模型各层的特征，该函数指导搜索每层的剪枝比以获得全局最优。然后AUTO-PRUNE将修剪后的权重矩阵映射到交叉栏，以仅存储非平凡元素。最后，为了避免位错问题，我们在基于rerram的加速器中设计了一种新的数据路径，利用运算单元的机制，正确地索引和馈送输入到矩阵向量计算中。实验结果表明，与目前的工作相比，AUTO-PRUNE实现了高达3.3倍的压缩率，3.1倍的面积效率和3.3倍的能量效率，并且具有相似甚至更高的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing

自引率

0.00%

发文量

期刊最新文献

Accelerating BWA-MEM Read Mapping on GPUs. Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs. Priority Algorithms with Advice for Disjoint Path Allocation Problems From Data of Internet of Things to Domain Knowledge: A Case Study of Exploration in Smart Agriculture On Two Variants of Induced Matchings