Demand-Aware Power Management for Power-Constrained HPC Systems

Cao Thang, Yuan He, Masaaki Kondo
{"title":"Demand-Aware Power Management for Power-Constrained HPC Systems","authors":"Cao Thang, Yuan He, Masaaki Kondo","doi":"10.1109/CCGrid.2016.25","DOIUrl":null,"url":null,"abstract":"As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
功率受限HPC系统的需求感知电源管理
由于有限的功耗预算成为开发超级计算机系统中最关键的挑战之一,在热设计功率限制的限制下安装更多节点的硬件过量配置是设计超大规模超级计算机的一种有吸引力的方法。在本设计中,每个节点的功耗应通过硬件中配备的功率旋钮进行控制,如动态电压频率缩放(DVFS)或功率封顶机制。传统上,在超级计算机系统中,调度程序决定何时何地分配作业。在供应过剩的系统中,调度器还需要关心每个作业的功率分配。一种简单的方法是为每个作业设置一个固定的功率上限,使总功耗在系统的功率约束范围内。这种固定的功率上限不一定提供良好的性能,因为作业的有效功率使用在整个执行过程中都会发生变化。此外,由于每个工作都有自己的性能要求,固定的功率上限可能不适用于所有工作。在本文中,我们提出了一个需求感知电源管理框架,用于供应过剩和功率受限的高性能计算(HPC)系统。作业调度器根据可用的硬件和电源资源选择要运行的作业。电源管理器持续监控电源使用情况,预测执行作业的性能,并优化每个CPU的功率上限,以便满足每个作业所需的性能水平,同时通过充分利用可用的功率预算来提高系统吞吐量。在实际高性能计算系统上的实验和大型系统的仿真表明,电源管理器可以成功地控制作业执行的功耗,同时使系统吞吐量提高1.17倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era DTStorage: Dynamic Tape-Based Storage for Cost-Effective and Highly-Available Streaming Service Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1