使用SLURM的感知通信的作业调度

P. Mishra, Tushar Agrawal, Preeti Malakar
{"title":"使用SLURM的感知通信的作业调度","authors":"P. Mishra, Tushar Agrawal, Preeti Malakar","doi":"10.1145/3409390.3409410","DOIUrl":null,"url":null,"abstract":"Job schedulers play an important role in selecting optimal resources for the submitted jobs. However, most of the current job schedulers do not consider job-specific characteristics such as communication patterns during resource allocation. This often leads to sub-optimal node allocations. We propose three node allocation algorithms that consider the job’s communication behavior to improve the performance of communication-intensive jobs. We develop our algorithms for tree-based network topologies. The proposed algorithms aim at minimizing network contention by allocating nodes on the least contended switches. We also show that allocating nodes in powers of two leads to a decrease in inter-switch communication for MPI communications, which further improves performance. We implement and evaluate our algorithms using SLURM, a widely-used and well-known job scheduler. We show that the proposed algorithms can reduce the execution times of communication-intensive jobs by 9% (326 hours) on average. The average wait time of jobs is reduced by 31% across three supercomputer job logs.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Communication-aware Job Scheduling using SLURM\",\"authors\":\"P. Mishra, Tushar Agrawal, Preeti Malakar\",\"doi\":\"10.1145/3409390.3409410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Job schedulers play an important role in selecting optimal resources for the submitted jobs. However, most of the current job schedulers do not consider job-specific characteristics such as communication patterns during resource allocation. This often leads to sub-optimal node allocations. We propose three node allocation algorithms that consider the job’s communication behavior to improve the performance of communication-intensive jobs. We develop our algorithms for tree-based network topologies. The proposed algorithms aim at minimizing network contention by allocating nodes on the least contended switches. We also show that allocating nodes in powers of two leads to a decrease in inter-switch communication for MPI communications, which further improves performance. We implement and evaluate our algorithms using SLURM, a widely-used and well-known job scheduler. We show that the proposed algorithms can reduce the execution times of communication-intensive jobs by 9% (326 hours) on average. The average wait time of jobs is reduced by 31% across three supercomputer job logs.\",\"PeriodicalId\":350506,\"journal\":{\"name\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 49th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3409390.3409410\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

作业调度器在为提交的作业选择最佳资源方面发挥着重要作用。但是,当前大多数作业调度器都不考虑特定于作业的特征,例如资源分配期间的通信模式。这通常会导致次优节点分配。为了提高通信密集型作业的性能,我们提出了三种考虑作业通信行为的节点分配算法。我们开发了基于树的网络拓扑算法。提出的算法旨在通过在竞争最少的交换机上分配节点来减少网络竞争。我们还表明,以2的幂分配节点会导致MPI通信的交换机间通信减少,从而进一步提高性能。我们使用SLURM来实现和评估我们的算法,SLURM是一个广泛使用和知名的作业调度器。我们的研究表明,所提出的算法可以将通信密集型作业的执行时间平均减少9%(326小时)。在三个超级计算机作业日志中,作业的平均等待时间减少了31%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Communication-aware Job Scheduling using SLURM
Job schedulers play an important role in selecting optimal resources for the submitted jobs. However, most of the current job schedulers do not consider job-specific characteristics such as communication patterns during resource allocation. This often leads to sub-optimal node allocations. We propose three node allocation algorithms that consider the job’s communication behavior to improve the performance of communication-intensive jobs. We develop our algorithms for tree-based network topologies. The proposed algorithms aim at minimizing network contention by allocating nodes on the least contended switches. We also show that allocating nodes in powers of two leads to a decrease in inter-switch communication for MPI communications, which further improves performance. We implement and evaluate our algorithms using SLURM, a widely-used and well-known job scheduler. We show that the proposed algorithms can reduce the execution times of communication-intensive jobs by 9% (326 hours) on average. The average wait time of jobs is reduced by 31% across three supercomputer job logs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Characterizing the Cost-Accuracy Performance of Cloud Applications Symmetric Tokens based Group Mutual Exclusion Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-stepping Exploiting Dynamism in HPC Applications to Optimize Energy-Efficiency Developing Checkpointing and Recovery Procedures with the Storage Services of Amazon Web Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1