MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC) Pub Date : 2022-10-24 DOI:10.1109/IGSC55832.2022.9969374

Muhammad Sabih, Ashutosh Mishra, Frank Hannig, Jürgen Teich

{"title":"MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks","authors":"Muhammad Sabih, Ashutosh Mishra, Frank Hannig, Jürgen Teich","doi":"10.1109/IGSC55832.2022.9969374","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are computationally intensive, making them difficult to deploy on resource-constrained embedded systems. Model compression is a set of techniques that removes redundancies from a neural network with affordable degradation in task performance. Most compression methods do not target hardware-based objectives such as latency directly; however, few methods approximate latency with floating-point operations (FLOPs) or multiply-accumulate operations (MACs). Using these indirect metrics cannot directly translate to the relevant performance metric on the hardware, i.e., latency and throughput. To address this limitation, we introduce Multi-Objective Sensitivity Pruning, “MOSP,” a three-stage pipeline for filter pruning: hardware-aware sensitivity analysis, Criteria-optimal configuration selection, and pruning based on explainable AI (XAI). Our pipeline is compatible with a single or combination of target objectives such as latency, energy consumption, and accuracy. Our method first formulates the sensitivity of layers of a model against the target objectives as a classical machine learning problem. Next, we choose a Criteria-optimal configuration controlled by hyperparameters specific to each objective of choice. Finally, we apply XAI-based filter ranking to select filters to be pruned. The pipeline follows an iterative pruning methodology to recover any loss in degradation in task performance (e.g., accuracy). We allow the user to prefer one objective function over the other. Our method outperforms the selected baseline method across different neural networks and datasets in both accuracy and latency reductions and is competitive with state-of-the-art approaches.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGSC55832.2022.9969374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Deep neural networks (DNNs) are computationally intensive, making them difficult to deploy on resource-constrained embedded systems. Model compression is a set of techniques that removes redundancies from a neural network with affordable degradation in task performance. Most compression methods do not target hardware-based objectives such as latency directly; however, few methods approximate latency with floating-point operations (FLOPs) or multiply-accumulate operations (MACs). Using these indirect metrics cannot directly translate to the relevant performance metric on the hardware, i.e., latency and throughput. To address this limitation, we introduce Multi-Objective Sensitivity Pruning, “MOSP,” a three-stage pipeline for filter pruning: hardware-aware sensitivity analysis, Criteria-optimal configuration selection, and pruning based on explainable AI (XAI). Our pipeline is compatible with a single or combination of target objectives such as latency, energy consumption, and accuracy. Our method first formulates the sensitivity of layers of a model against the target objectives as a classical machine learning problem. Next, we choose a Criteria-optimal configuration controlled by hyperparameters specific to each objective of choice. Finally, we apply XAI-based filter ranking to select filters to be pruned. The pipeline follows an iterative pruning methodology to recover any loss in degradation in task performance (e.g., accuracy). We allow the user to prefer one objective function over the other. Our method outperforms the selected baseline method across different neural networks and datasets in both accuracy and latency reductions and is competitive with state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度神经网络的多目标灵敏度剪枝

深度神经网络(dnn)是计算密集型的，这使得它们很难部署在资源受限的嵌入式系统上。模型压缩是一组从神经网络中去除冗余的技术，可以承受任务性能的下降。大多数压缩方法不直接针对基于硬件的目标，如延迟;然而，很少有方法用浮点操作(FLOPs)或乘法累加操作(mac)来近似延迟。使用这些间接指标不能直接转换为硬件上的相关性能指标，即延迟和吞吐量。为了解决这一限制，我们引入了多目标灵敏度修剪，“MOSP”，这是一个用于过滤器修剪的三阶段管道:硬件感知灵敏度分析，标准优化配置选择，以及基于可解释人工智能(XAI)的修剪。我们的管道兼容单个或多个目标目标，如延迟、能耗和准确性。我们的方法首先将模型层对目标目标的敏感性表述为经典的机器学习问题。接下来，我们选择一个由特定于每个选择目标的超参数控制的标准最优配置。最后，我们应用基于xai的过滤器排序来选择需要修剪的过滤器。管道遵循迭代修剪方法，以恢复任务性能(例如，准确性)退化中的任何损失。我们允许用户选择一个目标函数。我们的方法在准确性和延迟降低方面优于不同神经网络和数据集上选择的基线方法，并且与最先进的方法具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

自引率

0.00%

发文量