首页 > 最新文献

Proceedings of the 5th International Workshop on Energy Efficient Supercomputing最新文献

英文 中文
Adaptive Time-based Encoding for Energy-Efficient Large Cache Architectures 高能效大缓存架构的自适应时基编码
Payman Behnam, N. Sedaghati, M. N. Bojnordi
Demanding larger memory footprint and relying heavily on data locality has made last-level cache (LLC) a major contributor to overall energy consumption in modern computer systems. As a result, numerous techniques have been proposed to reduce power dissipation in LLCs via low power interconnects, energy-efficient signaling, and power-aware data encoding. One such technique that has proven successful at lowering dynamic power in cache interconnects is time-based data encoding that represents data with the time elapsed between subsequent pulses on a wire. Regrettably, a time-based data representation induces excessive transmission delay per every block transfer, thereby degrading the energy efficiency of memory intensive applications. This paper presents a novel adaptive mechanism that monitors characteristics of every application at runtime and intelligently uses time-based codes for LLC interconnects, thereby alleviating the diverse impact of longer transmission delay in time-based codes while still saving significant energy. Two adaptation approaches are realized for the proposed mechanism to monitor 1) application phases and 2) memory bursts. Experimental results on a set of 12 memory intensive parallel applications on a quad-core system indicate that the proposed encoding mechanism can improve system performance by an average of 9%, which results in improving the system energy-efficiency by 7% on average. Moreover, the proposed hardware controller consumes less than 1% area of a 4MB LLC.
要求更大的内存占用和严重依赖数据局部性使得最后一级缓存(LLC)成为现代计算机系统中总能耗的主要贡献者。因此,人们提出了许多技术,通过低功耗互连、节能信令和功率感知数据编码来降低有限责任公司的功耗。在降低缓存互连中的动态功率方面,一种已被证明是成功的技术是基于时间的数据编码,该编码用导线上后续脉冲之间的时间间隔表示数据。遗憾的是,基于时间的数据表示导致每个块传输的传输延迟过大,从而降低了内存密集型应用程序的能源效率。本文提出了一种新的自适应机制,在运行时监控每个应用程序的特性,并智能地使用基于时间的代码进行LLC互连,从而减轻了基于时间的代码传输延迟的各种影响,同时仍然节省了大量的能源。对于所提出的机制,实现了两种自适应方法来监控1)应用阶段和2)内存突发。在一个四核系统上的12个内存密集型并行应用程序上的实验结果表明,所提出的编码机制可以使系统性能平均提高9%,从而使系统能效平均提高7%。此外,所提出的硬件控制器消耗的面积小于4MB LLC的1%。
{"title":"Adaptive Time-based Encoding for Energy-Efficient Large Cache Architectures","authors":"Payman Behnam, N. Sedaghati, M. N. Bojnordi","doi":"10.1145/3149412.3149417","DOIUrl":"https://doi.org/10.1145/3149412.3149417","url":null,"abstract":"Demanding larger memory footprint and relying heavily on data locality has made last-level cache (LLC) a major contributor to overall energy consumption in modern computer systems. As a result, numerous techniques have been proposed to reduce power dissipation in LLCs via low power interconnects, energy-efficient signaling, and power-aware data encoding. One such technique that has proven successful at lowering dynamic power in cache interconnects is time-based data encoding that represents data with the time elapsed between subsequent pulses on a wire. Regrettably, a time-based data representation induces excessive transmission delay per every block transfer, thereby degrading the energy efficiency of memory intensive applications. This paper presents a novel adaptive mechanism that monitors characteristics of every application at runtime and intelligently uses time-based codes for LLC interconnects, thereby alleviating the diverse impact of longer transmission delay in time-based codes while still saving significant energy. Two adaptation approaches are realized for the proposed mechanism to monitor 1) application phases and 2) memory bursts. Experimental results on a set of 12 memory intensive parallel applications on a quad-core system indicate that the proposed encoding mechanism can improve system performance by an average of 9%, which results in improving the system energy-efficiency by 7% on average. Moreover, the proposed hardware controller consumes less than 1% area of a 4MB LLC.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121658821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Performance and Power Characteristics and Optimizations of Hybrid MPI/OpenMP LULESH Miniapps under Various Workloads 不同工作负载下混合MPI/OpenMP LULESH Miniapps的性能和功耗特性及优化
Xingfu Wu, V. Taylor, Jeanine E. Cook, Tanner Juedeman
Energy efficient execution of scientific applications requires insight into how HPC system features affect the performance and power of the applications. In this paper, we analyze and model performance and power characteristics of hybrid MPI/OpenMP LULESH (Livermore Unstructured Lagrange Explicit Shock Hydrodynamics) miniapps under various workloads using MuMMI (Multiple Metrics Modeling Infrastructure). Output from these models is then used to guide code optimizations of performance and power. Our optimization methods result in performance improvement and energy savings of up to approximately 10%. Further, based on the insight learned from our models and measurements under various workloads, applying DCT (Dynamic Concurrency Throttling) to the optimized codes results in the energy savings by 43.12% to 58.30% for different problem sizes compared with the baseline results on 27 nodes with 32 threads per node on a 36-node Intel Haswell testbed cluster Shepard.
科学应用程序的节能执行需要深入了解HPC系统特性如何影响应用程序的性能和功率。在本文中,我们使用MuMMI (Multiple Metrics Modeling Infrastructure)分析和建模了混合MPI/OpenMP LULESH (Livermore Unstructured Lagrange Explicit Shock Hydrodynamics,利弗莫尔非结构化拉格朗日显式激波流体动力学)小应用程序在各种工作负载下的性能和功率特性。然后使用这些模型的输出来指导代码优化性能和功率。我们的优化方法提高了性能,节省了大约10%的能源。此外,根据我们在各种工作负载下的模型和测量结果,与36个节点Intel Haswell测试平台集群Shepard上每个节点32个线程的27个节点的基线结果相比,在不同的问题规模下,对优化代码应用DCT(动态并发节流)可以节省43.12%到58.30%的能源。
{"title":"Performance and Power Characteristics and Optimizations of Hybrid MPI/OpenMP LULESH Miniapps under Various Workloads","authors":"Xingfu Wu, V. Taylor, Jeanine E. Cook, Tanner Juedeman","doi":"10.1145/3149412.3149416","DOIUrl":"https://doi.org/10.1145/3149412.3149416","url":null,"abstract":"Energy efficient execution of scientific applications requires insight into how HPC system features affect the performance and power of the applications. In this paper, we analyze and model performance and power characteristics of hybrid MPI/OpenMP LULESH (Livermore Unstructured Lagrange Explicit Shock Hydrodynamics) miniapps under various workloads using MuMMI (Multiple Metrics Modeling Infrastructure). Output from these models is then used to guide code optimizations of performance and power. Our optimization methods result in performance improvement and energy savings of up to approximately 10%. Further, based on the insight learned from our models and measurements under various workloads, applying DCT (Dynamic Concurrency Throttling) to the optimized codes results in the energy savings by 43.12% to 58.30% for different problem sizes compared with the baseline results on 27 nodes with 32 threads per node on a 36-node Intel Haswell testbed cluster Shepard.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115976823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An empirical survey of performance and energy efficiency variation on Intel processors 英特尔处理器性能和能效变化的实证调查
Aniruddha Marathe, Yijia Zhang, Grayson Blanks, Nirmal Kumbhare, G. Abdulla, B. Rountree
Traditional HPC performance and energy characterization approaches assume homogeneity and predictability in the performance of the target processor platform. Consequently, processor performance variation has been considered to be a secondary issue in the broader problem of performance characterization. In this work, we present an empirical survey of the variation in processor performance and energy efficiency on several generations of HPC-grade Intel processors. Our study shows that, compared to the previous generation of Intel processors, the problem of performance variation has become worse on more recent generation of Intel processors. Specifically, the performance variation across processors on a large-scale production HPC cluster at LLNL has increased to 20% and the run-to-run variation in the performance of individual processors has increased to 15%. We show that this variation is further magnified under a hardware-enforced power constraint, potentially due to the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor's energy management functionality. Our experimentation with a hardware-enforced processor power constraint shows that the variation in processor performance and energy efficiency has increased by up to 4x on the latest Intel processors.
传统的高性能计算性能和能量表征方法假设目标处理器平台的性能具有同质性和可预测性。因此,处理器性能变化被认为是性能表征这一更广泛问题中的次要问题。在这项工作中,我们对几代高性能pc级英特尔处理器的处理器性能和能效变化进行了实证调查。我们的研究表明,与上一代英特尔处理器相比,性能变化的问题在最新一代英特尔处理器上变得更糟。具体来说,在LLNL的大规模生产HPC集群上,处理器之间的性能差异增加到20%,单个处理器之间的性能差异增加到15%。我们表明,这种变化在硬件强制功率约束下进一步放大,可能是由于内核数量的增加,芯片制造过程中的不一致性以及它们对处理器能量管理功能的综合影响。我们对硬件强制处理器功率约束的实验表明,在最新的英特尔处理器上,处理器性能和能源效率的变化增加了4倍。
{"title":"An empirical survey of performance and energy efficiency variation on Intel processors","authors":"Aniruddha Marathe, Yijia Zhang, Grayson Blanks, Nirmal Kumbhare, G. Abdulla, B. Rountree","doi":"10.1145/3149412.3149421","DOIUrl":"https://doi.org/10.1145/3149412.3149421","url":null,"abstract":"Traditional HPC performance and energy characterization approaches assume homogeneity and predictability in the performance of the target processor platform. Consequently, processor performance variation has been considered to be a secondary issue in the broader problem of performance characterization. In this work, we present an empirical survey of the variation in processor performance and energy efficiency on several generations of HPC-grade Intel processors. Our study shows that, compared to the previous generation of Intel processors, the problem of performance variation has become worse on more recent generation of Intel processors. Specifically, the performance variation across processors on a large-scale production HPC cluster at LLNL has increased to 20% and the run-to-run variation in the performance of individual processors has increased to 15%. We show that this variation is further magnified under a hardware-enforced power constraint, potentially due to the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor's energy management functionality. Our experimentation with a hardware-enforced processor power constraint shows that the variation in processor performance and energy efficiency has increased by up to 4x on the latest Intel processors.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control 使用特定于核心的功率控制提高内存受限应用的能源效率
Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, J. Prins, R. Fowler
Power is increasingly the limiting factor in High Performance Computing (HPC) at Exascale and will continue to influence future advancements in supercomputing. Recent processors equipped with on-board hardware counters allow real time monitoring of operating conditions such as energy and temperature, in addition to performance measures such as instructions retired and memory accesses. An experimental memory study presented on modern CPU architectures, Intel Sandybridge and Haswell, identifies a metric, TORo_core, that detects bandwidth saturation and increased latency. TORo-Core is used to construct a dynamic policy applied at coarse and fine-grained levels to modulate per-core power controls on Haswell machines. The coarse and fine-grained application of dynamic policy shows best energy savings of 32.1% and 19.5% with a 2% slowdown in both cases. On average for six MPI applications, the fine-grained dynamic policy speeds execution by 1% while the coarse-grained application results in a 3% slowdown. Energy savings through frequency reduction not only provide cost advantages, they also reduce resource contention and create additional thermal headroom for non-throttled cores improving performance.
功率日益成为百亿亿级高性能计算(HPC)的限制因素,并将继续影响超级计算的未来发展。最近的处理器配备了板载硬件计数器,除了可以实时监控指令退役和内存访问等性能指标外,还可以实时监控能源和温度等操作条件。一项基于现代CPU架构(Intel Sandybridge和Haswell)的实验性内存研究发现,TORo_core指标可以检测带宽饱和和延迟增加。TORo-Core用于构建在粗粒度和细粒度级别上应用的动态策略,以调制Haswell机器上的每核功率控制。动态策略的粗粒度和细粒度应用显示出最佳的节能效果,分别为32.1%和19.5%,两种情况下都有2%的减速。平均而言,对于6个MPI应用程序,细粒度动态策略的执行速度提高1%,而粗粒度应用程序的执行速度降低3%。通过降低频率节省能源不仅提供了成本优势,还减少了资源争用,并为非节流核心创造了额外的热净空,从而提高了性能。
{"title":"Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control","authors":"Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, J. Prins, R. Fowler","doi":"10.1145/3149412.3149418","DOIUrl":"https://doi.org/10.1145/3149412.3149418","url":null,"abstract":"Power is increasingly the limiting factor in High Performance Computing (HPC) at Exascale and will continue to influence future advancements in supercomputing. Recent processors equipped with on-board hardware counters allow real time monitoring of operating conditions such as energy and temperature, in addition to performance measures such as instructions retired and memory accesses. An experimental memory study presented on modern CPU architectures, Intel Sandybridge and Haswell, identifies a metric, TORo_core, that detects bandwidth saturation and increased latency. TORo-Core is used to construct a dynamic policy applied at coarse and fine-grained levels to modulate per-core power controls on Haswell machines. The coarse and fine-grained application of dynamic policy shows best energy savings of 32.1% and 19.5% with a 2% slowdown in both cases. On average for six MPI applications, the fine-grained dynamic policy speeds execution by 1% while the coarse-grained application results in a 3% slowdown. Energy savings through frequency reduction not only provide cost advantages, they also reduce resource contention and create additional thermal headroom for non-throttled cores improving performance.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"680 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Simulating Power Scheduling at Scale 大规模模拟电力调度
D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony
Comparison of power scheduling strategies at scale is challenging due to the limited availability of high performance computing (HPC) systems exposing power control to researchers. In this paper we describe PowSim, a simulator for comparing different power management strategies at large-scale for HPC systems. PowSim enables light-weight simulation of dynamically-changing hardware-enforced processor power caps at the scale of an HPC cluster, supporting power scheduling research. PowSim's architecture supports easily changing power scheduler, job scheduler, and application models to enable comparison studies. Preliminary results comparing generalized power scheduling strategies are also presented.
由于高性能计算(HPC)系统将功率控制暴露给研究人员,因此对大规模的功率调度策略进行比较具有挑战性。在本文中,我们描述了PowSim,一个模拟器,用于比较不同的电源管理策略在大规模的高性能计算系统。PowSim支持在高性能计算集群规模上对动态变化的硬件强制处理器功率上限进行轻量级模拟,支持功率调度研究。PowSim的架构支持轻松更改电源调度器、作业调度器和应用程序模型,以便进行比较研究。并给出了比较各种电力调度策略的初步结果。
{"title":"Simulating Power Scheduling at Scale","authors":"D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony","doi":"10.1145/3149412.3149414","DOIUrl":"https://doi.org/10.1145/3149412.3149414","url":null,"abstract":"Comparison of power scheduling strategies at scale is challenging due to the limited availability of high performance computing (HPC) systems exposing power control to researchers. In this paper we describe PowSim, a simulator for comparing different power management strategies at large-scale for HPC systems. PowSim enables light-weight simulation of dynamically-changing hardware-enforced processor power caps at the scale of an HPC cluster, supporting power scheduling research. PowSim's architecture supports easily changing power scheduler, job scheduler, and application models to enable comparison studies. Preliminary results comparing generalized power scheduling strategies are also presented.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127006700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications 高分子:用于高性能计算应用的能量监测和功率限制接口
I. Marincic, V. Vishwanath, H. Hoffmann
Power and energy consumption are now key design concerns in HPC. To develop software that meets power and energy constraints, scientific application developers must have a reliable way to measure these values and relate them to application-specific events. Scientists face two challenges when measuring and controlling power: (1) diversity---power and energy measurement interfaces differ between vendors---and (2) distribution---power measurements of MPI simulations should be unaffected by the mapping of MPI processes to physical hardware nodes. While some prior work defines standardized software interfaces for power management, these efforts do not support distributed environments. The result is that the current state-of-the-art requires scientists interested in power optimization to write tedious, error-prone application-and system-specific code. To make power measurement and management easier for scientists, we propose PoLiMEr, a user-space library that supports fine-grained application-level power monitoring and capping. We evaluate PoLiMEr by deploying it on Argonne National Laboratory's Theta system and using it to measure and cap power, scaling the performance and power of several applications on up to 1024 nodes. We find that PoLiMEr requires only a few additional lines of code, but easily allows users to detect energy anomalies, apply power caps, and evaluate Theta's unique architectural features.
功率和能耗现在是高性能计算设计的关键问题。为了开发满足功率和能源限制的软件,科学应用程序开发人员必须有一种可靠的方法来测量这些值,并将它们与应用程序特定的事件联系起来。科学家在测量和控制功率时面临两个挑战:(1)多样性——不同厂商的功率和能量测量接口不同;(2)分布——MPI模拟的功率测量不应受MPI进程映射到物理硬件节点的影响。虽然之前的一些工作为电源管理定义了标准化的软件接口,但这些工作并不支持分布式环境。其结果是,当前最先进的技术要求对电源优化感兴趣的科学家编写冗长、容易出错的应用程序和系统特定代码。为了使科学家更容易进行功率测量和管理,我们提出了PoLiMEr,这是一个用户空间库,支持细粒度应用级功率监测和封顶。我们通过在Argonne国家实验室的Theta系统上部署PoLiMEr来评估它,并使用它来测量和限制功率,在多达1024个节点上扩展几个应用程序的性能和功率。我们发现PoLiMEr只需要几行额外的代码,但很容易让用户检测能量异常,应用功率上限,并评估Theta的独特架构功能。
{"title":"PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications","authors":"I. Marincic, V. Vishwanath, H. Hoffmann","doi":"10.1145/3149412.3149419","DOIUrl":"https://doi.org/10.1145/3149412.3149419","url":null,"abstract":"Power and energy consumption are now key design concerns in HPC. To develop software that meets power and energy constraints, scientific application developers must have a reliable way to measure these values and relate them to application-specific events. Scientists face two challenges when measuring and controlling power: (1) diversity---power and energy measurement interfaces differ between vendors---and (2) distribution---power measurements of MPI simulations should be unaffected by the mapping of MPI processes to physical hardware nodes. While some prior work defines standardized software interfaces for power management, these efforts do not support distributed environments. The result is that the current state-of-the-art requires scientists interested in power optimization to write tedious, error-prone application-and system-specific code. To make power measurement and management easier for scientists, we propose PoLiMEr, a user-space library that supports fine-grained application-level power monitoring and capping. We evaluate PoLiMEr by deploying it on Argonne National Laboratory's Theta system and using it to measure and cap power, scaling the performance and power of several applications on up to 1024 nodes. We find that PoLiMEr requires only a few additional lines of code, but easily allows users to detect energy anomalies, apply power caps, and evaluate Theta's unique architectural features.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131187615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Execution Phase Prediction Based on Phase Precursors and Locality 基于阶段前驱和局部性的执行阶段预测
Saman Khoshbakht, N. Dimopoulos
This paper focuses on different methods developed to detect the upcoming execution phase of a workload with regards to power demands. By controlling the state of the processor in power demanding phases, the operating system can maintain a relatively steady power pattern in the workload, leading to higher power headroom in the system. We compared two main approaches in phase prediction. Firstly, we show that by detecting the precursors leading to an upcoming phase, the system can speculate the next phase with high accuracy. Additionally, we compared this method with another approach which relies on the assumption of phase locality, expecting the current dominant phase to continue in the near future. Our results show that by detecting the precursors we can detect 81% of the upcoming phases with lower processor frequency switching overhead compared to most of the proposed locality-based methods.
本文重点介绍了根据电力需求检测工作负载即将执行阶段的不同方法。通过在功率要求阶段控制处理器的状态,操作系统可以在工作负载中保持相对稳定的功率模式,从而在系统中获得更高的功率剩余空间。我们比较了两种主要的相位预测方法。首先,我们证明了通过检测导致下一阶段的前体,系统可以高精度地推测下一阶段。此外,我们将该方法与另一种依赖于相位局部性假设的方法进行了比较,该方法期望当前的主导相位在不久的将来继续存在。我们的研究结果表明,与大多数基于位置的方法相比,通过检测前体,我们可以以更低的处理器频率切换开销检测81%的即将到来的相位。
{"title":"Execution Phase Prediction Based on Phase Precursors and Locality","authors":"Saman Khoshbakht, N. Dimopoulos","doi":"10.1145/3149412.3149415","DOIUrl":"https://doi.org/10.1145/3149412.3149415","url":null,"abstract":"This paper focuses on different methods developed to detect the upcoming execution phase of a workload with regards to power demands. By controlling the state of the processor in power demanding phases, the operating system can maintain a relatively steady power pattern in the workload, leading to higher power headroom in the system. We compared two main approaches in phase prediction. Firstly, we show that by detecting the precursors leading to an upcoming phase, the system can speculate the next phase with high accuracy. Additionally, we compared this method with another approach which relies on the assumption of phase locality, expecting the current dominant phase to continue in the near future. Our results show that by detecting the precursors we can detect 81% of the upcoming phases with lower processor frequency switching overhead compared to most of the proposed locality-based methods.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116740145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scalable performance bounding under multiple constrained renewable resources 多约束可再生资源下的可伸缩性能边界
R. Medhat, S. Funk, B. Rountree
In the age of exascale computing, it is crucial to provide the best possible performance under power constraints. A major part of this optimization is managing power and bandwidth intelligently in a cluster to maximize performance. There are significant improvements in the power efficiency of HPC runtimes, yet little work has explored our ability to determine the theoretical optimal performance under a give power and bandwidth bound. In this paper, we present a scalable model to identify the optimal power and bandwidth distribution such that the makespan of a program is minimized. We utilize the network flow formulation in constructing a linear program that is efficient to solve. We demonstrate the applicability of the model to MPI programs and provide synthetic benchmarks on the performance of the model.
在百亿亿次计算时代,在功率限制下提供最佳性能至关重要。这种优化的一个主要部分是智能地管理集群中的功率和带宽,以最大化性能。高性能计算运行时的功率效率有了显著的提高,但在给定功率和带宽限制下确定理论最佳性能的能力却很少有研究。在本文中,我们提出了一个可扩展的模型来确定最优的功率和带宽分布,使程序的最大完工时间最小化。我们利用网络流公式构造了一个有效求解的线性规划。我们证明了该模型对MPI程序的适用性,并提供了模型性能的综合基准。
{"title":"Scalable performance bounding under multiple constrained renewable resources","authors":"R. Medhat, S. Funk, B. Rountree","doi":"10.1145/3149412.3149422","DOIUrl":"https://doi.org/10.1145/3149412.3149422","url":null,"abstract":"In the age of exascale computing, it is crucial to provide the best possible performance under power constraints. A major part of this optimization is managing power and bandwidth intelligently in a cluster to maximize performance. There are significant improvements in the power efficiency of HPC runtimes, yet little work has explored our ability to determine the theoretical optimal performance under a give power and bandwidth bound. In this paper, we present a scalable model to identify the optimal power and bandwidth distribution such that the makespan of a program is minimized. We utilize the network flow formulation in constructing a linear program that is efficient to solve. We demonstrate the applicability of the model to MPI programs and provide synthetic benchmarks on the performance of the model.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130420884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic Application-aware Power Capping 动态应用感知功率封顶
Bo Wang, Dirk Schmidl, C. Terboven, Matthias S. Müller
A future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to all components. The default power supply of current system guarantees TDP to each computing node will become unfeasible. Power capping was introduced to limit power consumption to a value below TDP, with the drawback of resulting performance limitations. We developed an alternative dynamic application-aware power scheduling (DAPS) strategy to enforce a predetermined power limit and at the same time improve the cluster-wide performance. The power scheduling decision is guided by the cap value, the hardware usage, and the application-specific performance sensitivity to power. Applying DAPS on a test platform comprising 12 computing nodes with three representative applications, we obtained a performance improvement up to 17% compared to a strategy that distributes power equally and statically across nodes.
未来的大规模高性能计算(HPC)集群可能会受到功率限制,因为电力供应和冷却等周边基础设施受到限制。对于这样的集群,可能不可能为所有组件提供热设计功率(TDP)。当前系统的默认电源保证了TDP到每个计算节点将变得不可行的。引入功率封顶是为了将功耗限制在TDP以下,缺点是导致性能限制。我们开发了一种可选的动态应用感知功率调度(DAPS)策略来强制执行预定的功率限制,同时提高集群范围的性能。功率调度决策是由cap值、硬件使用情况和特定应用的性能对功率的敏感性来指导的。在包含12个计算节点和3个代表性应用程序的测试平台上应用DAPS,与在节点之间平均静态分配功率的策略相比,我们获得了高达17%的性能提升。
{"title":"Dynamic Application-aware Power Capping","authors":"Bo Wang, Dirk Schmidl, C. Terboven, Matthias S. Müller","doi":"10.1145/3149412.3149413","DOIUrl":"https://doi.org/10.1145/3149412.3149413","url":null,"abstract":"A future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to all components. The default power supply of current system guarantees TDP to each computing node will become unfeasible. Power capping was introduced to limit power consumption to a value below TDP, with the drawback of resulting performance limitations. We developed an alternative dynamic application-aware power scheduling (DAPS) strategy to enforce a predetermined power limit and at the same time improve the cluster-wide performance. The power scheduling decision is guided by the cap value, the hardware usage, and the application-specific performance sensitivity to power. Applying DAPS on a test platform comprising 12 computing nodes with three representative applications, we obtained a performance improvement up to 17% compared to a strategy that distributes power equally and statically across nodes.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128555795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
PANN: Power Allocation via Neural Networks Dynamic Bounded-Power Allocation in High Performance Computing 基于神经网络的高性能计算动态有界功率分配
William E. Whiteside, S. Funk, Aniruddha Marathe, B. Rountree
Exascale architecture computers will be limited not only by hardware but also by power consumption. In these bounded power situations, a system can deliver better results by overprovisioning -having more hardware than can be fully powered. Overprovisioned systems require power to be an integral part of any scheduling algorithm. This paper introduces a system called PANN that uses neural networks to dynamically allocate power in overprovisioned systems. Traces of applications are used to train a neural network power controller, which is then used as an online power allocation system. Simulation results were obtained on traces of ParaDiS and work is continuing on more applications. We found in simulations PANN completes jobs up to 24% faster than static allocation. For tightly constrained systems PANN performs 6% to 11% better than Conductor. A runtime system has been constructed, but it is not yet performing as expected, reasons for this are explored.
百亿亿级架构计算机不仅会受到硬件的限制,还会受到功耗的限制。在这些有限的功率情况下,系统可以通过过度供应(拥有比完全供电更多的硬件)提供更好的结果。过度供应的系统需要将电力作为任何调度算法的一个组成部分。本文介绍了一种利用神经网络对电力供应过剩的系统进行动态分配的系统。应用轨迹用于训练神经网络功率控制器,然后将其用作在线功率分配系统。得到了ParaDiS轨迹的仿真结果,并在进一步的应用中继续进行。我们在模拟中发现,PANN完成任务的速度比静态分配快24%。对于严格约束的系统,PANN的性能比Conductor好6%到11%。已经构造了一个运行时系统,但是它还没有按照预期的那样执行,本文将探讨其原因。
{"title":"PANN: Power Allocation via Neural Networks Dynamic Bounded-Power Allocation in High Performance Computing","authors":"William E. Whiteside, S. Funk, Aniruddha Marathe, B. Rountree","doi":"10.1145/3149412.3149420","DOIUrl":"https://doi.org/10.1145/3149412.3149420","url":null,"abstract":"Exascale architecture computers will be limited not only by hardware but also by power consumption. In these bounded power situations, a system can deliver better results by overprovisioning -having more hardware than can be fully powered. Overprovisioned systems require power to be an integral part of any scheduling algorithm. This paper introduces a system called PANN that uses neural networks to dynamically allocate power in overprovisioned systems. Traces of applications are used to train a neural network power controller, which is then used as an online power allocation system. Simulation results were obtained on traces of ParaDiS and work is continuing on more applications. We found in simulations PANN completes jobs up to 24% faster than static allocation. For tightly constrained systems PANN performs 6% to 11% better than Conductor. A runtime system has been constructed, but it is not yet performing as expected, reasons for this are explored.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121676992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 5th International Workshop on Energy Efficient Supercomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1