Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems

Pengfei Zou, Xizhou Feng, Rong Ge
{"title":"Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems","authors":"Pengfei Zou, Xizhou Feng, Rong Ge","doi":"10.1109/NAS.2019.8834721","DOIUrl":null,"url":null,"abstract":"As power becomes a top challenge in HPC systems and data centers, how to sustain the system performance growth under limited available or permissible power becomes an important research topic. Traditionally, researchers have explored collocating non-interfering jobs on the same nodes to improve system performance. Nevertheless, power limits reduce the capacity of components, nodes, and systems, and induce or aggravate contention between jobs. Using prior power-oblivious job collocation strategies on power limited systems can adversely degrade system throughput. In this paper, we quantitatively estimate contention induced by power limits, and propose a Contention-Aware Power-bounded Scheduling (CAPS) for systems with finite power budgets. CAPS chooses to collocate jobs that are complementary when power is limited, and distributes the available power to nodes and components to minimize their interference. Experimental results show that CAPS improves system throughput and power efficiency by 10% or greater than power-oblivious job collocation strategies, depending on the available power, for hybrid MPI/OpenMP benchmarks on a 192-core 8-node cluster.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2019.8834721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

As power becomes a top challenge in HPC systems and data centers, how to sustain the system performance growth under limited available or permissible power becomes an important research topic. Traditionally, researchers have explored collocating non-interfering jobs on the same nodes to improve system performance. Nevertheless, power limits reduce the capacity of components, nodes, and systems, and induce or aggravate contention between jobs. Using prior power-oblivious job collocation strategies on power limited systems can adversely degrade system throughput. In this paper, we quantitatively estimate contention induced by power limits, and propose a Contention-Aware Power-bounded Scheduling (CAPS) for systems with finite power budgets. CAPS chooses to collocate jobs that are complementary when power is limited, and distributes the available power to nodes and components to minimize their interference. Experimental results show that CAPS improves system throughput and power efficiency by 10% or greater than power-oblivious job collocation strategies, depending on the available power, for hybrid MPI/OpenMP benchmarks on a 192-core 8-node cluster.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
电力有限系统的竞争感知工作负载和资源协同调度
随着功率成为高性能计算系统和数据中心面临的最大挑战,如何在有限的可用或允许功率下保持系统性能增长成为一个重要的研究课题。传统上,研究人员一直在探索在相同节点上配置互不干扰的作业以提高系统性能。然而,功率限制降低了组件、节点和系统的容量,并引发或加剧了工作之间的竞争。在功率有限的系统上使用先验功率无关的作业配置策略会降低系统吞吐量。本文定量地估计了由功率限制引起的争用,并针对有限功率预算的系统提出了一种感知争用的功率有限调度方法。当功率有限时,CAPS会选择互补的作业并配,并将可用的功率分配给节点和组件,以最大限度地减少它们之间的干扰。实验结果表明,在192核8节点集群的混合MPI/OpenMP基准测试中,CAPS比功率无关作业搭配策略提高了10%或更高的系统吞吐量和功率效率,具体取决于可用功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
NAS 2019 Program Optimizing Tail Latency of LDPC based Flash Memory Storage Systems Via Smart Refresh HCMonitor: An Accurate Measurement System for High Concurrent Network Services Learning Workflow Scheduling on Multi-Resource Clusters An Adaptive SSD Cache Architecture Simultaneously Using Multiple Caches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1