2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)最新文献

英文中文

A Low-Cost Stochastic Computing-based Fuzzy Filtering for Image Noise Reduction 一种低成本的基于随机计算的图像降噪模糊滤波

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969358

Seyedeh Newsha Estiri, Amir Hossein Jalilvand, S. Naderi, M. Najafi, Mahdi Fazeli

Images are often corrupted with noise. As a result, noise reduction is an important task in image processing. Common noise reduction techniques, such as mean or median filtering, lead to blurring of the edges in the image, while fuzzy filters are able to preserve the edge information. In this work, we implement an efficient hardware design for a well-known fuzzy noise reduction filter based on stochastic computing. The filter consists of two main stages: edge detection and fuzzy smoothing. The fuzzy difference, which is encoded as bit-streams, is used to detect edges. Then, fuzzy smoothing is done to average the pixel value based on eight directions. Our experimental results show a significant reduction in the hardware area and power consumption compared to the conventional binary implementation while preserving the quality of the results.

图像经常被噪声破坏。因此，降噪是图像处理中的一项重要任务。常见的降噪技术，如均值滤波或中值滤波，会导致图像中的边缘模糊，而模糊滤波器能够保留边缘信息。在这项工作中，我们实现了一种基于随机计算的知名模糊降噪滤波器的高效硬件设计。该滤波器主要包括两个阶段:边缘检测和模糊平滑。模糊差分被编码为比特流，用于检测边缘。然后，基于八个方向对像素值进行模糊平滑平均;我们的实验结果表明，与传统的二进制实现相比，在保持结果质量的同时，硬件面积和功耗显着减少。

引用次数: 1

Exploring Automatic Gym Workouts Recognition Locally on Wearable Resource-Constrained Devices 探索在可穿戴资源受限设备上的自动健身训练识别

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969370

Sizhen Bian, Xiaying Wang, T. Polonelli, M. Magno

Automatic gym activity recognition on energy-and resource-constrained wearable devices removes the human-interaction requirement during intense gym sessions - like soft-touch tapping and swiping. This work presents a tiny and highly accurate residual convolutional neural network that runs in milliwatt microcontrollers for automatic workouts classification. We evaluated the inference performance of the deep model with quantization on three resource-constrained devices: two microcontrollers with ARM-Cortex M4 and M7 core from ST Microelectronics, and a GAP8 system on chip, which is an open-sourced, multi-core RISC-V computing platform from Green-Waves Technologies. Experimental results show an accuracy of up to 90.4% for eleven workouts recognition with full precision inference. The paper also presents the trade-off performance of the resource-constrained system. While keeping the recognition accuracy (88.1%) with minimal loss, each inference takes only 3.2 ms on GAP8, benefiting from the 8 RISC-V cluster cores. We measured that it features an execution time that is 18.9x and 6.5x faster than the Cortex-M4 and Cortex-M7 cores, showing the feasibility of real-time on-board workouts recognition based on the described data set with 20 Hz sampling rate. The energy consumed for each inference on GAP8 is 0.41 mJ compared to 5.17 mJ on Cortex-M4 and 8.07 mJ on Cortex-M7 with the maximum clock. It can lead to longer battery life when the system is battery-operated. We also introduced an open data set composed of fifty sessions of eleven gym workouts collected from ten subjects that is publicly available.

在能量和资源有限的可穿戴设备上自动识别健身活动，消除了在高强度健身期间对人类互动的需求——比如轻触和滑动。这项工作提出了一个微小的和高度精确的残余卷积神经网络，运行在毫瓦微控制器自动训练分类。我们在三种资源受限的设备上评估了深度模型的量化推理性能:ST微电子的两个ARM-Cortex M4和M7内核的微控制器，以及Green-Waves Technologies的开源多核RISC-V计算平台GAP8片上系统。实验结果表明，在全精度推理的情况下，对11种训练的识别准确率可达90.4%。本文还讨论了资源约束系统的权衡性能。在保持最小损失的识别精度(88.1%)的同时，每次推理在GAP8上只需要3.2 ms，这得益于8个RISC-V集群内核。我们测量到，它的执行时间比Cortex-M4和Cortex-M7内核分别快18.9倍和6.5倍，表明了基于所描述的数据集以20 Hz采样率进行实时机载训练识别的可行性。GAP8上每个推理的能量消耗为0.41 mJ，而Cortex-M4上的能量消耗为5.17 mJ, Cortex-M7上的能量消耗为8.07 mJ。当系统是电池操作时，它可以导致更长的电池寿命。我们还引入了一个开放数据集，该数据集由从10个公开可用的主题中收集的50次11次健身房锻炼组成。

{"title":"Exploring Automatic Gym Workouts Recognition Locally on Wearable Resource-Constrained Devices","authors":"Sizhen Bian, Xiaying Wang, T. Polonelli, M. Magno","doi":"10.1109/IGSC55832.2022.9969370","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969370","url":null,"abstract":"Automatic gym activity recognition on energy-and resource-constrained wearable devices removes the human-interaction requirement during intense gym sessions - like soft-touch tapping and swiping. This work presents a tiny and highly accurate residual convolutional neural network that runs in milliwatt microcontrollers for automatic workouts classification. We evaluated the inference performance of the deep model with quantization on three resource-constrained devices: two microcontrollers with ARM-Cortex M4 and M7 core from ST Microelectronics, and a GAP8 system on chip, which is an open-sourced, multi-core RISC-V computing platform from Green-Waves Technologies. Experimental results show an accuracy of up to 90.4% for eleven workouts recognition with full precision inference. The paper also presents the trade-off performance of the resource-constrained system. While keeping the recognition accuracy (88.1%) with minimal loss, each inference takes only 3.2 ms on GAP8, benefiting from the 8 RISC-V cluster cores. We measured that it features an execution time that is 18.9x and 6.5x faster than the Cortex-M4 and Cortex-M7 cores, showing the feasibility of real-time on-board workouts recognition based on the described data set with 20 Hz sampling rate. The energy consumed for each inference on GAP8 is 0.41 mJ compared to 5.17 mJ on Cortex-M4 and 8.07 mJ on Cortex-M7 with the maximum clock. It can lead to longer battery life when the system is battery-operated. We also introduced an open data set composed of fifty sessions of eleven gym workouts collected from ten subjects that is publicly available.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116840014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Electrical Commissioning Owner's Project Requirements: A Template 电气调试业主的项目要求:模板

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969369

Brandon Hong, E. Thomason, Aditya M. Deshpande

As the power demands of Supercomputers continue to grow, so does the demands of the electrical systems that support the infrastructure and building in which these Supercomputers reside. A typical new Supercomputer installation requires an upgrade to the design of the electrical system. As Supercomputers are refreshed roughly every 3 years which in turn drives electrical systems upgrades. The pre-design phase is critical for planning the installation of a new Supercomputer and requires documenting the overarching project purpose, goals, expectations, preferences, and limitations for the electrical systems, especially as the number of stakeholders increases. This Owner's Project Requirements (OPR) document then becomes guidance to the engineering and design teams for the development of the initial basis-of-design and subsequent construction documents. The electrical systems commissioning OPR provides a guideline for stakeholders to make sure that the electrical systems are well designed ‘up-front’ in the process of installation of a new Supercomputer. It also serves as a guiding checklist for the reader to use to inform their own generation of project guiding documents. This document will assist the owner and respective HPC infrastructure stakeholders in writing an OPR for the electrical systems supporting data centers or high-performance computing (HPC) facilities. This paper provides a template for developing an electrical system commissioning OPR. The template is sub-divided into sections that should be discussed and documented as part of the overall project requirements. The expectation is that this outline template forms a starting point for discussions for generating a guiding document for the commissioning of the electrical systems and standardizes the best practices and processes needed for the certification of the electrical commissioning of the HPC Supercomputers facilities.

随着超级计算机的功率需求不断增长，支持这些超级计算机所在的基础设施和建筑的电气系统的需求也在不断增长。一台典型的新超级计算机安装需要对电气系统的设计进行升级。因为超级计算机大约每三年更新一次，这反过来又推动了电气系统的升级。预设计阶段对于规划新超级计算机的安装至关重要，需要记录电气系统的总体项目目的、目标、期望、偏好和限制，特别是当利益相关者数量增加时。业主项目需求(OPR)文件随后成为工程和设计团队制定初始设计基础和后续施工文件的指南。电气系统调试OPR为利益相关者提供了一个指导方针，以确保在安装新超级计算机的过程中，电气系统的“预先”设计良好。它还可以作为一个指导清单，供读者使用，以告知他们自己的项目指导文档。本文档将帮助业主和各自的HPC基础设施利益相关者为支持数据中心或高性能计算(HPC)设施的电气系统编写OPR。本文为电气系统调试OPR的开发提供了一个模板。模板被细分为应该作为整个项目需求的一部分进行讨论和记录的部分。期望这个大纲模板成为讨论的起点，以生成电气系统调试的指导性文件，并标准化HPC超级计算机设施电气调试认证所需的最佳实践和过程。

{"title":"Electrical Commissioning Owner's Project Requirements: A Template","authors":"Brandon Hong, E. Thomason, Aditya M. Deshpande","doi":"10.1109/IGSC55832.2022.9969369","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969369","url":null,"abstract":"As the power demands of Supercomputers continue to grow, so does the demands of the electrical systems that support the infrastructure and building in which these Supercomputers reside. A typical new Supercomputer installation requires an upgrade to the design of the electrical system. As Supercomputers are refreshed roughly every 3 years which in turn drives electrical systems upgrades. The pre-design phase is critical for planning the installation of a new Supercomputer and requires documenting the overarching project purpose, goals, expectations, preferences, and limitations for the electrical systems, especially as the number of stakeholders increases. This Owner's Project Requirements (OPR) document then becomes guidance to the engineering and design teams for the development of the initial basis-of-design and subsequent construction documents. The electrical systems commissioning OPR provides a guideline for stakeholders to make sure that the electrical systems are well designed ‘up-front’ in the process of installation of a new Supercomputer. It also serves as a guiding checklist for the reader to use to inform their own generation of project guiding documents. This document will assist the owner and respective HPC infrastructure stakeholders in writing an OPR for the electrical systems supporting data centers or high-performance computing (HPC) facilities. This paper provides a template for developing an electrical system commissioning OPR. The template is sub-divided into sections that should be discussed and documented as part of the overall project requirements. The expectation is that this outline template forms a starting point for discussions for generating a guiding document for the commissioning of the electrical systems and standardizes the best practices and processes needed for the certification of the electrical commissioning of the HPC Supercomputers facilities.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126551697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Soft Cluster Powercap at SuperMUC-NG with EAR 带有EAR的supermu - ng软集群电源帽

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969360

J. Corbalán, Lluis Alonso, C. Navarrete, Carla Guillén

This paper describes the Soft cluster powercap management system implemented and evaluated on SuperMUC-NG using the EAR software. SuperMUC-NG is one of the biggest supercomputers in Europe with 6480 Intel Skylake Xeon Platinum 8174 and EAR is the system software used for energy management. SuperMUC-NG has a power limit with a certain degree of tolerance, being possible to exceed the limit for a short time, as long as the power is on average under the hard limit over a longer period. Otherwise, the data center would incur a cost penalty. We call this use case Soft Cluster Powercap, since it is different from the traditional Hard Cluster Powercap where the power limit cannot be exceeded. This paper presents the design of the EAR node and Soft Cluster Powercap and the evaluation of the EAR node powercap and the soft cluster powercap. The evaluation included in this paper has been limited to CPU-only kernels and applications for the node powercap and to one island of SuperMUC-NG (792 nodes) for the soft cluster powercap. Currently the solution is deployed in the whole cluster.

本文介绍了利用EAR软件在supermu - ng上实现和评估的软集群电源封盖管理系统。supermu - ng是欧洲最大的超级计算机之一，拥有6480台英特尔Skylake Xeon Platinum 8174, EAR是用于能源管理的系统软件。SuperMUC-NG具有一定的功率限制，具有一定的容限，只要功率在较长时间内平均低于硬限制，就有可能在短时间内超过该限制。否则，数据中心将产生成本损失。我们称这个用例为软集群Powercap，因为它不同于传统的不能超过功率限制的硬集群Powercap。本文介绍了EAR节点和软集群功率帽的设计，并对EAR节点和软集群功率帽进行了评估。本文中包含的评估仅限于用于节点功率上限的仅cpu内核和应用程序，以及用于软集群功率上限的supermu - ng(792个节点)的一个岛。目前该解决方案部署在整个集群中。

引用次数: 0

Unified Cross-Layer Cluster-Node Scheduling for Heterogeneous Datacenters 异构数据中心的统一跨层集群节点调度

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969366

Wenkai Guan, Cristinel Ababei

In this paper, we present a two-level hierarchical scheduler for datacenters called Qin. The goal of the proposed scheduler is to exploit increased server heterogeneity. It combines in a unified approach cluster and node level scheduling algorithms, and it can consider specific optimization objectives including job completion time, energy usage, and energy delay product (EDP). Its novelty lies in the unified approach and in modeling interference and heterogeneity. Experiments on a real cluster demonstrate the proposed approach outperforms state-of-the-art schedulers by 10.2 % in completion time, 38.65 % in energy usage, and 41.98% in EDP.

在本文中，我们提出了一个两级分层的数据中心调度程序秦。建议的调度器的目标是利用增加的服务器异构性。它以统一的方式将集群和节点级调度算法结合在一起，并且可以考虑特定的优化目标，包括作业完成时间、能源使用和能源延迟积(EDP)。它的新颖之处在于方法的统一和对干扰和异质性的建模。在真实集群上的实验表明，该方法在完成时间、能耗和EDP方面分别比最先进的调度程序高出10.2%、38.65%和41.98%。

引用次数: 1

Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning 基于深度学习的CPU-GPU混合图形应用的最优启动边界选择

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969364

Md. Erfanul Haque Rafi, Apan Qasem

Graph algorithms, which are at heart of emerging computation domains such as machine learning, are notoriously difficult to optimize because of their irregular behavior. The challenges are magnified on current CPU-GPU heterogeneous platforms. In this paper, we study the problem of GPU launch bound configuration in hybrid graph algorithms. We train a multi-objective deep neural network to learn a function that maps input graph characteristics and runtime program behavior to a set of launch bound parameters. When applying launch bounds predicted by our neural network in BFS and SSSP algorithms, we observe as much as 2.76× speedup on certain graph instances and an overall speedup of 1.31 and 1.61, respectively. Similar improvements are seen in energy efficiency of the applications, with an average reduction of 14% in peak power consumption across 20 real-world input graphs. Evaluation of the neural network shows that it is robust and generalizable and yields close to a 90% accuracy on cross-validation.

图算法是机器学习等新兴计算领域的核心，由于其不规则行为，众所周知难以优化。在当前的CPU-GPU异构平台上，挑战被放大了。本文研究了混合图算法中GPU启动边界配置问题。我们训练了一个多目标深度神经网络来学习一个函数，该函数将输入图形特征和运行时程序行为映射到一组启动边界参数。当我们的神经网络在BFS和SSSP算法中应用预测的启动边界时，我们观察到在某些图实例上加速高达2.76倍，总体加速分别为1.31和1.61。在应用程序的能源效率方面也看到了类似的改进，在20个实际输入图中，峰值功耗平均降低了14%。对神经网络的评估表明，它具有鲁棒性和可泛化性，交叉验证的准确率接近90%。

引用次数: 0

Energy-Performance-Security Trade-off in Mobile Edge Computing 移动边缘计算中的能源-性能-安全权衡

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969375

Mahipal P. Singh, S. Sankaran

Multi-access Edge Computing (MEC) also known as Mobile Edge Computing is a type of edge computing that extends the capabilities of cloud computing by bringing resources to the edge of the network. Traditional cloud computing occurs on remote servers far away from users and IoT devices, whereas MEC allows computing processes to take place at base stations, central offices or other aggregation points within the transport network. However, localization of MEC nodes near the data-generating devices gives rise to several challenges such as launch of several attacks from data-generating or network edge devices on the nodes. In this paper, we simulate Distributed Denial of Service (DDoS) and Routing-based attacks to determine their impact on energy and performance. In addition, we propose a novel approach for mitigating the DDoS attack in MEC nodes. Our approach accurately discriminates between high-rate and low-rate DDoS attacks and provides defence against both of them. Our method has a 90% success rate in successfully detecting and thwarting DDoS attack on MEC nodes.

多访问边缘计算(MEC)也称为移动边缘计算，是一种边缘计算，通过将资源带到网络边缘来扩展云计算的功能。传统的云计算发生在远离用户和物联网设备的远程服务器上，而MEC允许计算过程在基站、中心办公室或传输网络内的其他聚合点进行。然而，MEC节点在数据生成设备附近的本地化会带来一些挑战，例如数据生成设备或网络边缘设备对节点发起多次攻击。在本文中，我们模拟了分布式拒绝服务(DDoS)和基于路由的攻击，以确定它们对能源和性能的影响。此外，我们提出了一种新的方法来减轻MEC节点中的DDoS攻击。我们的方法可以准确区分高速率和低速率DDoS攻击，并提供防御。我们的方法在MEC节点上成功检测和阻止DDoS攻击的成功率为90%。

引用次数: 0

A Review of Smart Buildings Protocol and Systems with a Consideration of Security and Energy Awareness 考虑安全和能源意识的智能建筑协议和系统综述

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969359

Mini Zeng

In this paper, we discuss different smart building communication protocols or systems. We discuss the security features of the existing smart building communication protocols systems. We also discuss the possible attacks and vulnerabilities relying on the building automation systems. We provide the taxonomy of the most popular smart building communication protocols considering security features and energy-saving solutions. The motivation of this paper is to guide the designers and developers of smart building systems with the consideration of security and energy awareness.

在本文中，我们讨论了不同的智能建筑通信协议或系统。讨论了现有智能楼宇通信协议系统的安全特性。我们还讨论了依赖于楼宇自动化系统的可能的攻击和漏洞。我们提供了考虑安全功能和节能解决方案的最流行的智能建筑通信协议的分类。本文的目的是指导智能建筑系统的设计者和开发者考虑安全和能源意识。

引用次数: 0

Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems 猛禽:减少统一内存系统下CPU-GPU错误共享

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969376

Md. Erfanul Haque Rafi, Kaylee Williams, Apan Qasem

The introduction of Unified Memory (UM) technology has greatly increased the programmability of CPU-GPU heterogeneous systems. At the same time, Unified Memory systems have given rise to new performance challenges. Achieving the desired performance and energy efficiency on such systems requires careful consideration of data allocation and migration. This paper looks at the problem of false sharing under UM. We present Raptor, a system for fast and accurate detection of page-level false sharing in heterogeneous applications. The system employs binary code instrumentation and leverages hardware performance counters to track UM allocations and data access patterns and pinpoint energy inefficiencies created by the occurrence of false sharing. Experiments on a suite of heterogeneous applications show false sharing can be a common occurrence in collaborative design paradigms with tight coupling of CPU-GPU tasks. When false sharing is eliminated via a padding scheme, applications are able to achieve higher performance at lower clock frequencies, leading to improved energy efficiency by as much as 2.96× and by 1.62× and 1.47× on average on two contemporary CPU-GPU platforms.

统一内存(UM)技术的引入大大提高了CPU-GPU异构系统的可编程性。同时，统一存储系统也带来了新的性能挑战。在这样的系统上实现理想的性能和能源效率需要仔细考虑数据分配和迁移。研究了UM下的虚假共享问题。我们提出Raptor，一个在异构应用程序中快速准确地检测页面级错误共享的系统。该系统采用二进制代码检测，并利用硬件性能计数器来跟踪UM分配和数据访问模式，并查明由错误共享造成的能源效率低下。在一组异构应用程序上的实验表明，在CPU-GPU任务紧密耦合的协作设计范式中，错误共享可能是常见的现象。当通过填充方案消除虚假共享时，应用程序能够在较低的时钟频率下实现更高的性能，从而在两个现代CPU-GPU平台上平均提高高达2.96倍，1.62倍和1.47倍的能效。

{"title":"Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems","authors":"Md. Erfanul Haque Rafi, Kaylee Williams, Apan Qasem","doi":"10.1109/IGSC55832.2022.9969376","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969376","url":null,"abstract":"The introduction of Unified Memory (UM) technology has greatly increased the programmability of CPU-GPU heterogeneous systems. At the same time, Unified Memory systems have given rise to new performance challenges. Achieving the desired performance and energy efficiency on such systems requires careful consideration of data allocation and migration. This paper looks at the problem of false sharing under UM. We present Raptor, a system for fast and accurate detection of page-level false sharing in heterogeneous applications. The system employs binary code instrumentation and leverages hardware performance counters to track UM allocations and data access patterns and pinpoint energy inefficiencies created by the occurrence of false sharing. Experiments on a suite of heterogeneous applications show false sharing can be a common occurrence in collaborative design paradigms with tight coupling of CPU-GPU tasks. When false sharing is eliminated via a padding scheme, applications are able to achieve higher performance at lower clock frequencies, leading to improved energy efficiency by as much as 2.96× and by 1.62× and 1.47× on average on two contemporary CPU-GPU platforms.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128634426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks 深度神经网络的多目标灵敏度剪枝

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969374

Muhammad Sabih, Ashutosh Mishra, Frank Hannig, Jürgen Teich

Deep neural networks (DNNs) are computationally intensive, making them difficult to deploy on resource-constrained embedded systems. Model compression is a set of techniques that removes redundancies from a neural network with affordable degradation in task performance. Most compression methods do not target hardware-based objectives such as latency directly; however, few methods approximate latency with floating-point operations (FLOPs) or multiply-accumulate operations (MACs). Using these indirect metrics cannot directly translate to the relevant performance metric on the hardware, i.e., latency and throughput. To address this limitation, we introduce Multi-Objective Sensitivity Pruning, “MOSP,” a three-stage pipeline for filter pruning: hardware-aware sensitivity analysis, Criteria-optimal configuration selection, and pruning based on explainable AI (XAI). Our pipeline is compatible with a single or combination of target objectives such as latency, energy consumption, and accuracy. Our method first formulates the sensitivity of layers of a model against the target objectives as a classical machine learning problem. Next, we choose a Criteria-optimal configuration controlled by hyperparameters specific to each objective of choice. Finally, we apply XAI-based filter ranking to select filters to be pruned. The pipeline follows an iterative pruning methodology to recover any loss in degradation in task performance (e.g., accuracy). We allow the user to prefer one objective function over the other. Our method outperforms the selected baseline method across different neural networks and datasets in both accuracy and latency reductions and is competitive with state-of-the-art approaches.

深度神经网络(dnn)是计算密集型的，这使得它们很难部署在资源受限的嵌入式系统上。模型压缩是一组从神经网络中去除冗余的技术，可以承受任务性能的下降。大多数压缩方法不直接针对基于硬件的目标，如延迟;然而，很少有方法用浮点操作(FLOPs)或乘法累加操作(mac)来近似延迟。使用这些间接指标不能直接转换为硬件上的相关性能指标，即延迟和吞吐量。为了解决这一限制，我们引入了多目标灵敏度修剪，“MOSP”，这是一个用于过滤器修剪的三阶段管道:硬件感知灵敏度分析，标准优化配置选择，以及基于可解释人工智能(XAI)的修剪。我们的管道兼容单个或多个目标目标，如延迟、能耗和准确性。我们的方法首先将模型层对目标目标的敏感性表述为经典的机器学习问题。接下来，我们选择一个由特定于每个选择目标的超参数控制的标准最优配置。最后，我们应用基于xai的过滤器排序来选择需要修剪的过滤器。管道遵循迭代修剪方法，以恢复任务性能(例如，准确性)退化中的任何损失。我们允许用户选择一个目标函数。我们的方法在准确性和延迟降低方面优于不同神经网络和数据集上选择的基线方法，并且与最先进的方法具有竞争力。

{"title":"MOSP: Multi-Objective Sensitivity Pruning of Deep Neural Networks","authors":"Muhammad Sabih, Ashutosh Mishra, Frank Hannig, Jürgen Teich","doi":"10.1109/IGSC55832.2022.9969374","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969374","url":null,"abstract":"Deep neural networks (DNNs) are computationally intensive, making them difficult to deploy on resource-constrained embedded systems. Model compression is a set of techniques that removes redundancies from a neural network with affordable degradation in task performance. Most compression methods do not target hardware-based objectives such as latency directly; however, few methods approximate latency with floating-point operations (FLOPs) or multiply-accumulate operations (MACs). Using these indirect metrics cannot directly translate to the relevant performance metric on the hardware, i.e., latency and throughput. To address this limitation, we introduce Multi-Objective Sensitivity Pruning, “MOSP,” a three-stage pipeline for filter pruning: hardware-aware sensitivity analysis, Criteria-optimal configuration selection, and pruning based on explainable AI (XAI). Our pipeline is compatible with a single or combination of target objectives such as latency, energy consumption, and accuracy. Our method first formulates the sensitivity of layers of a model against the target objectives as a classical machine learning problem. Next, we choose a Criteria-optimal configuration controlled by hyperparameters specific to each objective of choice. Finally, we apply XAI-based filter ranking to select filters to be pruned. The pipeline follows an iterative pruning methodology to recover any loss in degradation in task performance (e.g., accuracy). We allow the user to prefer one objective function over the other. Our method outperforms the selected baseline method across different neural networks and datasets in both accuracy and latency reductions and is competitive with state-of-the-art approaches.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127546916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀