首页 > 最新文献

2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)最新文献

英文 中文
Energy Aware Scheduler of Single/Multi-Node Jobs Considering CPU Node Heterogeneity 考虑CPU节点异构性的单/多节点作业能量感知调度器
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969365
K. Fukazawa, Jiacheng Zhou, H. Nakashima
Modern CPUs suffer from power efficiency heterogeneity, which can result in additional energy cost or performance loss. On the other hand, future supercomputers are expected to be power constrained. This paper focuses on energy aware scheduling algorithms targeted on two situations considering this node heterogeneity. In single-node situation, workload consists of various single-node jobs, Combinatorial Optimization Algorithm saves energy by calculating a local optimal power efficiency node allocation plan from KM (Kuhn-Munkres) algorithm. In multi-node situation, power cap causes load unbalancing in multi-node jobs due to the node heterogeneity. Sliding Window Algorithm targets on reducing such unbalancing by sliding window. Proposed algorithms are evaluated in the simulation and real supercomputer environment. In single-node situation, Combinatorial Optimization Algorithm achieved up to 2.92% saving. For the multi-node situation, workload is designed based on real historic workload, and up to 5.36% saving was achieved by Sliding Window Algorithm.
现代cpu存在电源效率不均的问题,这可能导致额外的能源成本或性能损失。另一方面,未来的超级计算机预计会受到功率限制。本文重点研究了考虑节点异构性的两种情况下的能量感知调度算法。在单节点情况下,工作负载由多个单节点作业组成,组合优化算法通过KM (Kuhn-Munkres)算法计算局部最优的功率效率节点分配方案来节省能量。在多节点情况下,由于节点的异构性,功率封顶会导致多节点作业的负载不均衡。滑动窗口算法的目标是通过滑动窗口来减少这种不平衡。在仿真和真实的超级计算机环境中对所提出的算法进行了评估。在单节点情况下,组合优化算法的节省率高达2.92%。在多节点情况下,根据真实的历史工作负载来设计工作负载,采用滑动窗口算法最多可节省5.36%的工作量。
{"title":"Energy Aware Scheduler of Single/Multi-Node Jobs Considering CPU Node Heterogeneity","authors":"K. Fukazawa, Jiacheng Zhou, H. Nakashima","doi":"10.1109/IGSC55832.2022.9969365","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969365","url":null,"abstract":"Modern CPUs suffer from power efficiency heterogeneity, which can result in additional energy cost or performance loss. On the other hand, future supercomputers are expected to be power constrained. This paper focuses on energy aware scheduling algorithms targeted on two situations considering this node heterogeneity. In single-node situation, workload consists of various single-node jobs, Combinatorial Optimization Algorithm saves energy by calculating a local optimal power efficiency node allocation plan from KM (Kuhn-Munkres) algorithm. In multi-node situation, power cap causes load unbalancing in multi-node jobs due to the node heterogeneity. Sliding Window Algorithm targets on reducing such unbalancing by sliding window. Proposed algorithms are evaluated in the simulation and real supercomputer environment. In single-node situation, Combinatorial Optimization Algorithm achieved up to 2.92% saving. For the multi-node situation, workload is designed based on real historic workload, and up to 5.36% saving was achieved by Sliding Window Algorithm.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward a Behavioral-Level End-to-End Framework for Silicon Photonics Accelerators 面向硅光子加速器的行为级端到端框架
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969371
Emily Lattanzio, Ranyang Zhou, A. Roohi, Abdallah Khreishah, Durga Misra, Shaahin Angizi
Convolutional Neural Networks (CNNs) are widely used due to their effectiveness in various AI applications such as object recognition, speech processing, etc., where the multiply-and-accumulate (MAC) operation contributes to $sim 95%$ of the computation time. From the hardware implementation perspective, the performance of current CMOS-based MAC accelerators is limited mainly due to their von-Neumann architecture and corresponding limited memory bandwidth. In this way, silicon photonics has been recently explored as a promising solution for accelerator design to improve the speed and power-efficiency of the designs as opposed to electronic memristive crossbars. In this work, we briefly study recent silicon photonics accelerators and take initial steps to develop an open-source and adaptive crossbar architecture simulator for that. Keeping the original functionality of the MNSIM tool [1], we add a new photonic mode that utilizes the pre-existing algorithm to work with a photonic Phase Change Memory (pPCM) based crossbar structure. With inputs from the CNN's topology, the accelerator configuration, and experimentally-benchmarked data, the presented simulator can report the optimal crossbar size, the number of crossbars needed, and the estimation of total area, power, and latency.
卷积神经网络(cnn)由于其在各种人工智能应用中的有效性而被广泛使用,例如物体识别,语音处理等,其中乘法累加(MAC)操作贡献了95%的计算时间。从硬件实现的角度来看,目前基于cmos的MAC加速器的性能受到限制,主要是由于它们的冯-诺伊曼架构和相应的有限的内存带宽。通过这种方式,硅光子学最近被探索为一种有前途的加速器设计解决方案,以提高设计的速度和功率效率,而不是电子忆阻交叉杆。在这项工作中,我们简要地研究了最近的硅光子加速器,并采取初步步骤开发一个开源和自适应交叉杆架构模拟器。在保留MNSIM工具的原始功能[1]的基础上,我们增加了一种新的光子模式,该模式利用已有的算法与基于光子相变存储器(pPCM)的交叉棒结构一起工作。通过CNN的拓扑、加速器配置和实验基准数据的输入,所提出的模拟器可以报告最佳的交叉条大小、所需的交叉条数量以及对总面积、功率和延迟的估计。
{"title":"Toward a Behavioral-Level End-to-End Framework for Silicon Photonics Accelerators","authors":"Emily Lattanzio, Ranyang Zhou, A. Roohi, Abdallah Khreishah, Durga Misra, Shaahin Angizi","doi":"10.1109/IGSC55832.2022.9969371","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969371","url":null,"abstract":"Convolutional Neural Networks (CNNs) are widely used due to their effectiveness in various AI applications such as object recognition, speech processing, etc., where the multiply-and-accumulate (MAC) operation contributes to $sim 95%$ of the computation time. From the hardware implementation perspective, the performance of current CMOS-based MAC accelerators is limited mainly due to their von-Neumann architecture and corresponding limited memory bandwidth. In this way, silicon photonics has been recently explored as a promising solution for accelerator design to improve the speed and power-efficiency of the designs as opposed to electronic memristive crossbars. In this work, we briefly study recent silicon photonics accelerators and take initial steps to develop an open-source and adaptive crossbar architecture simulator for that. Keeping the original functionality of the MNSIM tool [1], we add a new photonic mode that utilizes the pre-existing algorithm to work with a photonic Phase Change Memory (pPCM) based crossbar structure. With inputs from the CNN's topology, the accelerator configuration, and experimentally-benchmarked data, the presented simulator can report the optimal crossbar size, the number of crossbars needed, and the estimation of total area, power, and latency.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121659219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Energy Efficient Memristor-based TCAM for Match-Action Processing 基于忆阻器的高效匹配动作处理TCAM研究
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969354
Saad Saleh, A. Goossens, T. Banerjee, B. Koldehofe
Match-action processors play a crucial role of communicating end-users in the Internet by computing network paths and enforcing administrator policies. The computation process uses a specialized memory called Ternary Content Addressable Memory (TCAM) to store processing rules and use header information of network packets to perform a match within a single clock cycle. Currently, TCAM memories consume huge amounts of energy resources due to the use of traditional transistor-based CMOS technology. In this article, we motivate the use of a novel component, the memristor, for the development of a TCAM architecture. Memristors can provide energy efficiency, non-volatility, and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture built upon the voltage divider principle for energy efficient match-action processing. Moreover, we have tested the performance of the memristor-based TCAM architecture using the experimental data of a novel Nb-doped SrTiO3 memristor. Energy analysis of the proposed TCAM architecture for given memristor shows promising power consumption statistics of 16 μ $W$ for a match operation and 1 $mu{W}$. for a mismatch operation.
匹配操作处理器通过计算网络路径和执行管理员策略,在Internet中与最终用户通信方面起着至关重要的作用。计算过程使用称为三元内容可寻址内存(TCAM)的专用内存来存储处理规则,并使用网络数据包的报头信息在单个时钟周期内执行匹配。目前,由于使用传统的基于晶体管的CMOS技术,TCAM存储器消耗了大量的能源。在本文中,我们鼓励使用一种新的组件,即忆阻器,来开发TCAM体系结构。与晶体管相比,忆阻器可以提供能源效率、非易失性和更好的资源密度。我们提出了一种新的基于记忆电阻的TCAM结构,该结构基于分压器原理,用于节能匹配处理。此外,我们还利用一种新型nb掺杂SrTiO3记忆电阻器的实验数据测试了基于记忆电阻器的TCAM架构的性能。对给定记忆电阻器的TCAM架构的能量分析表明,匹配操作的功耗统计数据为16 μ $W$和1 $mu{W}$。用于不匹配操作。
{"title":"Towards Energy Efficient Memristor-based TCAM for Match-Action Processing","authors":"Saad Saleh, A. Goossens, T. Banerjee, B. Koldehofe","doi":"10.1109/IGSC55832.2022.9969354","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969354","url":null,"abstract":"Match-action processors play a crucial role of communicating end-users in the Internet by computing network paths and enforcing administrator policies. The computation process uses a specialized memory called Ternary Content Addressable Memory (TCAM) to store processing rules and use header information of network packets to perform a match within a single clock cycle. Currently, TCAM memories consume huge amounts of energy resources due to the use of traditional transistor-based CMOS technology. In this article, we motivate the use of a novel component, the memristor, for the development of a TCAM architecture. Memristors can provide energy efficiency, non-volatility, and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture built upon the voltage divider principle for energy efficient match-action processing. Moreover, we have tested the performance of the memristor-based TCAM architecture using the experimental data of a novel Nb-doped SrTiO3 memristor. Energy analysis of the proposed TCAM architecture for given memristor shows promising power consumption statistics of 16 μ $W$ for a match operation and 1 $mu{W}$. for a mismatch operation.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Evaluation of Heuristics to Manage a Data Center Under Power Constraints 电力约束下数据中心管理的启发式评估
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969362
Igor Fontana De Nardin, P. Stolf, S. Caux
In recent years, academics and industry have increased their efforts to find solutions to reduce greenhouse gas (GHG) due to its impact on climate change. Two approaches to reducing these emissions are decreasing energy consumption and/or increasing the use of clean energy. Data centers are one of the most expensive energy actors in Information and Communications Technology (ICT). One way to provide clean energy to Data Centers is by using power from renewable sources, such as solar and wind. However, renewable energy introduces several uncertainties due to its intermittence. Dealing with these uncertainties demands different approaches at different levels of management. This work is part of the Datazero2 Project which introduces a clean-by-design data center architecture using only renewable energy. Due to no connection to the grid, the data center manager must handle power envelope constraints. This article investigates some scheduling and power capping online heuristics in an attempt to identify the best algorithms to handle fluctuating power profiles without hindering job execution. Then, it details experiments comparing the results of the heuristics. The results show that our heuristic provides a well-balanced solution considering power and Quality of Service (QoS).
近年来,由于温室气体对气候变化的影响,学术界和工业界都加大了寻找减少温室气体的解决方案的努力。减少这些排放的两种方法是减少能源消耗和/或增加清洁能源的使用。数据中心是信息和通信技术(ICT)中最昂贵的能源参与者之一。为数据中心提供清洁能源的一种方法是使用可再生能源,如太阳能和风能。然而,可再生能源由于其间歇性引入了一些不确定性。处理这些不确定性需要在不同的管理层次采取不同的方法。这项工作是Datazero2项目的一部分,该项目介绍了一种仅使用可再生能源的清洁设计数据中心架构。由于没有连接到电网,数据中心管理人员必须处理功率包络限制。本文研究了一些调度和功率上限在线启发式算法,试图确定在不妨碍作业执行的情况下处理波动功率配置的最佳算法。然后,对启发式算法的结果进行了详细的实验比较。结果表明,该启发式算法在考虑功率和服务质量(QoS)的情况下提供了一个很好的平衡解决方案。
{"title":"Evaluation of Heuristics to Manage a Data Center Under Power Constraints","authors":"Igor Fontana De Nardin, P. Stolf, S. Caux","doi":"10.1109/IGSC55832.2022.9969362","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969362","url":null,"abstract":"In recent years, academics and industry have increased their efforts to find solutions to reduce greenhouse gas (GHG) due to its impact on climate change. Two approaches to reducing these emissions are decreasing energy consumption and/or increasing the use of clean energy. Data centers are one of the most expensive energy actors in Information and Communications Technology (ICT). One way to provide clean energy to Data Centers is by using power from renewable sources, such as solar and wind. However, renewable energy introduces several uncertainties due to its intermittence. Dealing with these uncertainties demands different approaches at different levels of management. This work is part of the Datazero2 Project which introduces a clean-by-design data center architecture using only renewable energy. Due to no connection to the grid, the data center manager must handle power envelope constraints. This article investigates some scheduling and power capping online heuristics in an attempt to identify the best algorithms to handle fluctuating power profiles without hindering job execution. Then, it details experiments comparing the results of the heuristics. The results show that our heuristic provides a well-balanced solution considering power and Quality of Service (QoS).","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is More: Learning Simplicity in Datacenter Scheduling 少即是多:学习数据中心调度的简单性
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969372
Wenkai Guan, Cristinel Ababei
In this paper, we present a new scheduling algorithm, Qin2, for heterogeneous datacenters. Its goal is to improve performance measured as jobs completion time by exploiting increased server heterogeneity using deep neural network (DNN) models. The proposed scheduling framework uses an efficient automatic feature selection technique, which significantly reduces the training data size required to train the DNN to levels that provide satisfactory prediction accuracy. Its efficiency is especially helpful when the DNN model is re-trained to adapt it to new types of application workloads arriving to the datacenter. The novelty of the proposed scheduling approach lies in this feature selection technique and the integration of simple and training-efficient DNN models into a scheduler, which is deployed on a real cluster of heterogeneous nodes. Experiments demonstrate that the Qin2 scheduler outperforms state-of-the-art schedulers in terms of jobs completion time.
本文提出了一种新的异构数据中心调度算法Qin2。它的目标是通过使用深度神经网络(DNN)模型利用增加的服务器异构性来提高以作业完成时间为衡量标准的性能。所提出的调度框架采用了一种高效的自动特征选择技术,显著减少了训练深度神经网络所需的训练数据量,使其达到令人满意的预测精度。当重新训练DNN模型以使其适应到达数据中心的新型应用程序工作负载时,其效率尤其有用。该调度方法的新颖之处在于采用特征选择技术,并将简单且训练效率高的DNN模型集成到调度程序中,该调度程序部署在异构节点的真实集群上。实验表明,Qin2调度器在作业完成时间方面优于最先进的调度器。
{"title":"Less is More: Learning Simplicity in Datacenter Scheduling","authors":"Wenkai Guan, Cristinel Ababei","doi":"10.1109/IGSC55832.2022.9969372","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969372","url":null,"abstract":"In this paper, we present a new scheduling algorithm, Qin2, for heterogeneous datacenters. Its goal is to improve performance measured as jobs completion time by exploiting increased server heterogeneity using deep neural network (DNN) models. The proposed scheduling framework uses an efficient automatic feature selection technique, which significantly reduces the training data size required to train the DNN to levels that provide satisfactory prediction accuracy. Its efficiency is especially helpful when the DNN model is re-trained to adapt it to new types of application workloads arriving to the datacenter. The novelty of the proposed scheduling approach lies in this feature selection technique and the integration of simple and training-efficient DNN models into a scheduler, which is deployed on a real cluster of heterogeneous nodes. Experiments demonstrate that the Qin2 scheduler outperforms state-of-the-art schedulers in terms of jobs completion time.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122455898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Energy Efficiency of Node.js Applications with CPU DVFS Awareness 利用CPU DVFS感知优化Node.js应用的能效
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969367
Maria Patrou, K. Kent, Joran Siu, Michael H. Dawson
Node.js applications can incorporate CPU Dynamic Voltage and Frequency Scaling (DVFS) to adjust their energy consumption and runtime performance. Thus, we build a CPU frequency scaling policy that promotes “green” and high-performing requests and enables customizations of their execution profile. Our technique requires a profiling step to classify the web requests based on the CPU frequency impact on their energy consumption and runtime performance and on their code syntax/paradigm. We also include the case of concurrent request execution in our model to select an appropriate CPU frequency. We enable priority-based requests to work along with this model for users to customize and formulate a policy based on their goals. Finally, we perform an energy-runtime analysis, which shows that our policy with the proposed configurations is an energy-efficient approach compared to the Linux scaling governors.
Node.js应用程序可以结合CPU动态电压和频率缩放(DVFS)来调整它们的能耗和运行时性能。因此,我们构建了一个CPU频率缩放策略,以促进“绿色”和高性能请求,并支持自定义其执行配置文件。我们的技术需要一个分析步骤,根据CPU频率对其能耗和运行时性能以及代码语法/范式的影响对web请求进行分类。我们还在模型中包括并发请求执行的情况,以选择适当的CPU频率。我们允许基于优先级的请求与此模型一起工作,以便用户根据自己的目标定制和制定策略。最后,我们执行了一个能源运行时分析,结果表明,与Linux伸缩调控器相比,我们的策略与所建议的配置是一种节能的方法。
{"title":"Optimizing Energy Efficiency of Node.js Applications with CPU DVFS Awareness","authors":"Maria Patrou, K. Kent, Joran Siu, Michael H. Dawson","doi":"10.1109/IGSC55832.2022.9969367","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969367","url":null,"abstract":"Node.js applications can incorporate CPU Dynamic Voltage and Frequency Scaling (DVFS) to adjust their energy consumption and runtime performance. Thus, we build a CPU frequency scaling policy that promotes “green” and high-performing requests and enables customizations of their execution profile. Our technique requires a profiling step to classify the web requests based on the CPU frequency impact on their energy consumption and runtime performance and on their code syntax/paradigm. We also include the case of concurrent request execution in our model to select an appropriate CPU frequency. We enable priority-based requests to work along with this model for users to customize and formulate a policy based on their goals. Finally, we perform an energy-runtime analysis, which shows that our policy with the proposed configurations is an energy-efficient approach compared to the Linux scaling governors.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122564923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guiding Hardware-Driven Turbo with Application Performance Awareness 引导硬件驱动Turbo与应用程序性能意识
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969356
D. Wilson, Asma H. Al-rawi, Lowren H. Lawson, Siddhartha Jana, Federico Ardanaz, J. Eastep, A. Coskun
Parallel programming across many CPU cores offers many challenges in software design, such as mitigating performance or efficiency loss in applications that reach synchronization points at varying times across the CPU cores. Existing solutions often aim to resolve this through clever optimizations in application design, or by reacting to the imbalance by throttling the CPU core frequency of the early-finishing cores at application run time. In this work, we propose a method to rebalance bulksynchronous MPI applications by selectively speeding up the latefinishing cores throughout application run time. This algorithm makes use of the new Intel® Speed Select Turbo Frequency feature that enables software to guide the hardware toward increasing the turbo frequency limits of some cores in exchange for decreased turbo frequency limits in other cores. We demonstrate up to 40% energy reduction and 17% execution time reduction in a highly-imbalanced, compute-bound benchmark application and up to 21% energy reduction with 5% execution time reduction in an imbalanced real-world application.
跨多个CPU核心的并行编程为软件设计带来了许多挑战,例如,在跨CPU核心的不同时间到达同步点时,如何减轻应用程序的性能或效率损失。现有的解决方案通常旨在通过应用程序设计中的巧妙优化来解决这个问题,或者通过在应用程序运行时通过限制早期完成核心的CPU核心频率来对不平衡做出反应。在这项工作中,我们提出了一种方法,通过在整个应用程序运行时选择性地加速后期完成核心来重新平衡批量同步MPI应用程序。该算法利用新的英特尔®速度选择Turbo频率功能,使软件能够引导硬件增加某些核心的Turbo频率限制,以换取其他核心的Turbo频率限制降低。我们演示了在高度不平衡的计算约束基准应用程序中减少高达40%的能源和17%的执行时间,以及在不平衡的实际应用程序中减少高达21%的能源和5%的执行时间。
{"title":"Guiding Hardware-Driven Turbo with Application Performance Awareness","authors":"D. Wilson, Asma H. Al-rawi, Lowren H. Lawson, Siddhartha Jana, Federico Ardanaz, J. Eastep, A. Coskun","doi":"10.1109/IGSC55832.2022.9969356","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969356","url":null,"abstract":"Parallel programming across many CPU cores offers many challenges in software design, such as mitigating performance or efficiency loss in applications that reach synchronization points at varying times across the CPU cores. Existing solutions often aim to resolve this through clever optimizations in application design, or by reacting to the imbalance by throttling the CPU core frequency of the early-finishing cores at application run time. In this work, we propose a method to rebalance bulksynchronous MPI applications by selectively speeding up the latefinishing cores throughout application run time. This algorithm makes use of the new Intel® Speed Select Turbo Frequency feature that enables software to guide the hardware toward increasing the turbo frequency limits of some cores in exchange for decreased turbo frequency limits in other cores. We demonstrate up to 40% energy reduction and 17% execution time reduction in a highly-imbalanced, compute-bound benchmark application and up to 21% energy reduction with 5% execution time reduction in an imbalanced real-world application.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133679884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards an Energy-Efficient Hash-based Message Authentication Code (HMAC) 迈向节能的基于哈希的消息认证码(HMAC)
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969377
Cesar Castellon, Swapnoneel Roy, O. P. Kreidl, Ayan Dutta, Ladislau Bölöni
Hash-based message authentication code (HMAC) involves a secret cryptographic key and an underlying crypto-graphic hash function. HMAC is used to simultaneously verify both integrity and authenticity of messages and, in turn, plays a significant role in secure communication protocols e.g., Transport Layer Security (TLS). The high energy consumption of HMAC is well-known as is the trade-off between security, energy consumption, and performance. Previous research in reducing energy consumption in HMAC has approached the problem primarily at the system software level (e.g. scheduling algorithms). This paper attempts to reduce energy consumption in HMAC by applying an energy-reducing algorithmic engineering technique to the underlying hash function of HMAC, as a means to preserve the promised security benefits. Using pyRAPL, a python library to measure computational energy, we experiment with both the standard and energy-reduced implementations of HMAC for different input sizes (in bytes). Our results show up to 17% reduction in energy consumption by HMAC, while preserving function. Such energy savings in HMAC, by virtue of HMAC's prevalent use in existing network protocols, extrapolate to lighter-weight network operations with respect to total energy consumption.
基于哈希的消息验证码(HMAC)涉及一个秘密加密密钥和一个底层加密哈希函数。HMAC用于同时验证消息的完整性和真实性,反过来,在安全通信协议中起着重要作用,例如传输层安全(TLS)。HMAC的高能耗是众所周知的,因为它需要在安全性、能耗和性能之间进行权衡。先前关于降低HMAC能耗的研究主要是在系统软件层面(例如调度算法)解决问题。本文试图通过对HMAC的底层哈希函数应用降能算法工程技术来降低HMAC的能耗,以保持承诺的安全效益。使用pyRAPL(一个python库)来测量计算能量,我们对不同输入大小(以字节为单位)的HMAC的标准实现和能耗降低实现进行了实验。我们的研究结果表明,在保持功能的同时,HMAC的能耗降低了17%。由于HMAC在现有网络协议中的普遍使用,HMAC中的这种节能可以推断出相对于总能耗而言更轻的网络操作。
{"title":"Towards an Energy-Efficient Hash-based Message Authentication Code (HMAC)","authors":"Cesar Castellon, Swapnoneel Roy, O. P. Kreidl, Ayan Dutta, Ladislau Bölöni","doi":"10.1109/IGSC55832.2022.9969377","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969377","url":null,"abstract":"Hash-based message authentication code (HMAC) involves a secret cryptographic key and an underlying crypto-graphic hash function. HMAC is used to simultaneously verify both integrity and authenticity of messages and, in turn, plays a significant role in secure communication protocols e.g., Transport Layer Security (TLS). The high energy consumption of HMAC is well-known as is the trade-off between security, energy consumption, and performance. Previous research in reducing energy consumption in HMAC has approached the problem primarily at the system software level (e.g. scheduling algorithms). This paper attempts to reduce energy consumption in HMAC by applying an energy-reducing algorithmic engineering technique to the underlying hash function of HMAC, as a means to preserve the promised security benefits. Using pyRAPL, a python library to measure computational energy, we experiment with both the standard and energy-reduced implementations of HMAC for different input sizes (in bytes). Our results show up to 17% reduction in energy consumption by HMAC, while preserving function. Such energy savings in HMAC, by virtue of HMAC's prevalent use in existing network protocols, extrapolate to lighter-weight network operations with respect to total energy consumption.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ViT-LR: Pushing the Envelope for Transformer-Based on-Device Embedded Continual Learning ViT-LR:推动基于变压器的设备嵌入式持续学习
Pub Date : 2022-10-24 DOI: 10.1109/IGSC55832.2022.9969361
Alberto Dequino, Francesco Conti, L. Benini
State-of-the-Art Edge Artificial Intelligence (AI) is currently mostly targeted at a train-then-deploy paradigm: edge devices are exclusively responsible for inference, whereas training is delegated to data centers, leading to high energy and CO2 impact. On-Device Continual Learning could help in making Edge AI more sustainable by specializing AI models directly on-field. We deploy a continual image recognition model on a Jetson Xavier NX embedded system, and experimentally investigate how Attention influences performance and its viability as a Continual Learning backbone, analyzing the redundancy of its components to prune and further improve our solution efficiency. We achieve up to 83.81% accuracy on the Core50's new instances and classes scenario, starting from a pre-trained tiny Vision Transformer, surpassing AR1 *free with Latent Replay, and reach performance comparable and superior to the SoA without relying on growing Replay Examples.
最先进的边缘人工智能(AI)目前主要针对的是“训练-然后部署”模式:边缘设备专门负责推理,而训练则委托给数据中心,导致高能耗和二氧化碳影响。设备上的持续学习可以通过直接在现场专门设计人工智能模型,帮助边缘人工智能更具可持续性。我们在Jetson Xavier NX嵌入式系统上部署了一个连续图像识别模型,并实验研究了注意力对性能的影响及其作为持续学习主干的可行性,分析了其组件的冗余以减少并进一步提高我们的解决方案效率。我们在Core50的新实例和类别场景中实现了高达83.81%的准确率,从预训练的微型Vision Transformer开始,使用潜伏重播(Latent Replay)超过AR1 *免费,并且在不依赖于不断增长的重播示例的情况下达到与SoA相当且优于SoA的性能。
{"title":"ViT-LR: Pushing the Envelope for Transformer-Based on-Device Embedded Continual Learning","authors":"Alberto Dequino, Francesco Conti, L. Benini","doi":"10.1109/IGSC55832.2022.9969361","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969361","url":null,"abstract":"State-of-the-Art Edge Artificial Intelligence (AI) is currently mostly targeted at a train-then-deploy paradigm: edge devices are exclusively responsible for inference, whereas training is delegated to data centers, leading to high energy and CO2 impact. On-Device Continual Learning could help in making Edge AI more sustainable by specializing AI models directly on-field. We deploy a continual image recognition model on a Jetson Xavier NX embedded system, and experimentally investigate how Attention influences performance and its viability as a Continual Learning backbone, analyzing the redundancy of its components to prune and further improve our solution efficiency. We achieve up to 83.81% accuracy on the Core50's new instances and classes scenario, starting from a pre-trained tiny Vision Transformer, surpassing AR1 *free with Latent Replay, and reach performance comparable and superior to the SoA without relying on growing Replay Examples.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130575958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware 神经形态硬件上机器学习工作负载的节能部署
Pub Date : 2022-10-10 DOI: 10.1109/IGSC55832.2022.9969357
Peyton S. Chandarana, Mohammadreza Mohammadi, J. Seekings, Ramtin Zand
As the technology industry is moving towards implementing tasks such as natural language processing, path planning, image classification, and more on smaller edge computing devices, the demand for more efficient implementations of algorithms and hardware accelerators has become a significant area of research. In recent years, several edge deep learning hardware accelerators have been released that specifically focus on reducing the power and area consumed by deep neural networks (DNNs). On the other hand, spiking neural networks (SNNs) which operate on discrete time-series data, have been shown to achieve substantial power reductions over even the aforementioned edge DNN accelerators when deployed on specialized neuromorphic event-based/asynchronous hardware. While neuromorphic hardware has demonstrated great potential for accelerating deep learning tasks at the edge, the current space of algorithms and hardware is limited and still in rather early development. Thus, many hybrid approaches have been proposed which aim to convert pre-trained DNNs into SNNs. In this work, we provide a general guide to converting pre-trained DNNs into SNNs while also presenting techniques to improve the deployment of converted SNNs on neuromorphic hardware with respect to latency, power, and energy. Our experimental results show that when compared against the Intel Neural Compute Stick 2, Intel's neuromorphic processor, Loihi, consumes up to 27× less power and 5× less energy in the tested image classification tasks by using our SNN improvement techniques.
随着技术行业朝着在更小的边缘计算设备上实现自然语言处理、路径规划、图像分类等任务的方向发展,对更有效地实现算法和硬件加速器的需求已成为一个重要的研究领域。近年来,已经发布了一些边缘深度学习硬件加速器,专门用于降低深度神经网络(dnn)的功耗和面积消耗。另一方面,在离散时间序列数据上运行的尖峰神经网络(snn)已经被证明,当部署在专门的基于神经形态事件/异步硬件上时,甚至比前面提到的边缘DNN加速器也能实现大幅的功耗降低。虽然神经形态硬件已经证明了在边缘加速深度学习任务的巨大潜力,但目前算法和硬件的空间有限,而且仍处于相当早期的发展阶段。因此,提出了许多混合方法,旨在将预训练的dnn转换为snn。在这项工作中,我们提供了将预训练的dnn转换为snn的一般指南,同时还介绍了在延迟、功率和能量方面改进转换snn在神经形态硬件上的部署的技术。实验结果表明,与Intel Neural Compute Stick 2相比,使用我们的SNN改进技术,Intel的神经形态处理器Loihi在测试图像分类任务中的功耗降低了27倍,能耗降低了5倍。
{"title":"Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware","authors":"Peyton S. Chandarana, Mohammadreza Mohammadi, J. Seekings, Ramtin Zand","doi":"10.1109/IGSC55832.2022.9969357","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969357","url":null,"abstract":"As the technology industry is moving towards implementing tasks such as natural language processing, path planning, image classification, and more on smaller edge computing devices, the demand for more efficient implementations of algorithms and hardware accelerators has become a significant area of research. In recent years, several edge deep learning hardware accelerators have been released that specifically focus on reducing the power and area consumed by deep neural networks (DNNs). On the other hand, spiking neural networks (SNNs) which operate on discrete time-series data, have been shown to achieve substantial power reductions over even the aforementioned edge DNN accelerators when deployed on specialized neuromorphic event-based/asynchronous hardware. While neuromorphic hardware has demonstrated great potential for accelerating deep learning tasks at the edge, the current space of algorithms and hardware is limited and still in rather early development. Thus, many hybrid approaches have been proposed which aim to convert pre-trained DNNs into SNNs. In this work, we provide a general guide to converting pre-trained DNNs into SNNs while also presenting techniques to improve the deployment of converted SNNs on neuromorphic hardware with respect to latency, power, and energy. Our experimental results show that when compared against the Intel Neural Compute Stick 2, Intel's neuromorphic processor, Loihi, consumes up to 27× less power and 5× less energy in the tested image classification tasks by using our SNN improvement techniques.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1