节点共享策略在HPC批处理系统中的作用和收益

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-20 DOI:10.1109/IPDPS.2019.00016

Alvaro Frank, Tim Süß, A. Brinkmann

{"title":"节点共享策略在HPC批处理系统中的作用和收益","authors":"Alvaro Frank, Tim Süß, A. Brinkmann","doi":"10.1109/IPDPS.2019.00016","DOIUrl":null,"url":null,"abstract":"Processor manufacturers today scale performance by increasing the number of cores on each CPU. Unfortunately, not all HPC applications can efficiently saturate all cores of a single node, even if they successfully scale to thousands of nodes. For these applications, sharing nodes with other applications can help to stress different resources on the nodes to more efficiently use them. Previous work has shown that the performance impact of node sharing is very application dependent but very little work has studied its effects within batch systems and for complex parallel application mixes. Administrators therefore typically fear the complexity of running a batch system supporting node sharing and also fear that interference between co-allocated jobs in practice leads to worse performance. This paper focuses on sharing nodes by oversubscribing cores through hyper-threading. We introduce new node sharing strategies for batch systems by deriving extensions to the well-known backfill and first fit algorithms. These strategies have been implemented in the SLURM workload manager and the evaluation is based on NERSC Trinity scientific mini applications. The evaluation of our node sharing strategies shows no overhead when using co-allocation, but an increased computational efficiency of 19% and an increased scheduling efficiency of 25.2% compared to standard node allocation.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Effects and Benefits of Node Sharing Strategies in HPC Batch Systems\",\"authors\":\"Alvaro Frank, Tim Süß, A. Brinkmann\",\"doi\":\"10.1109/IPDPS.2019.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processor manufacturers today scale performance by increasing the number of cores on each CPU. Unfortunately, not all HPC applications can efficiently saturate all cores of a single node, even if they successfully scale to thousands of nodes. For these applications, sharing nodes with other applications can help to stress different resources on the nodes to more efficiently use them. Previous work has shown that the performance impact of node sharing is very application dependent but very little work has studied its effects within batch systems and for complex parallel application mixes. Administrators therefore typically fear the complexity of running a batch system supporting node sharing and also fear that interference between co-allocated jobs in practice leads to worse performance. This paper focuses on sharing nodes by oversubscribing cores through hyper-threading. We introduce new node sharing strategies for batch systems by deriving extensions to the well-known backfill and first fit algorithms. These strategies have been implemented in the SLURM workload manager and the evaluation is based on NERSC Trinity scientific mini applications. The evaluation of our node sharing strategies shows no overhead when using co-allocation, but an increased computational efficiency of 19% and an increased scheduling efficiency of 25.2% compared to standard node allocation.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

目前，处理器制造商通过增加每个CPU上的核心数量来扩展性能。不幸的是，并不是所有的HPC应用程序都能有效地饱和单个节点的所有核心，即使它们成功地扩展到数千个节点。对于这些应用程序，与其他应用程序共享节点可以帮助在节点上强调不同的资源，从而更有效地使用它们。以前的研究表明，节点共享对性能的影响非常依赖于应用程序，但很少有工作研究它在批处理系统和复杂并行应用程序混合中的影响。因此，管理员通常担心运行支持节点共享的批处理系统的复杂性，也担心实际中共同分配的作业之间的干扰会导致性能下降。本文的重点是通过超线程超额订阅内核来共享节点。通过对已知的回填和首次拟合算法的扩展，我们引入了新的节点共享策略。这些策略已在SLURM工作负载管理器中实现，并基于NERSC Trinity科学迷你应用程序进行评估。对我们的节点共享策略的评估表明，与标准节点分配相比，使用共同分配时没有任何开销，但计算效率提高了19%，调度效率提高了25.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Effects and Benefits of Node Sharing Strategies in HPC Batch Systems

Processor manufacturers today scale performance by increasing the number of cores on each CPU. Unfortunately, not all HPC applications can efficiently saturate all cores of a single node, even if they successfully scale to thousands of nodes. For these applications, sharing nodes with other applications can help to stress different resources on the nodes to more efficiently use them. Previous work has shown that the performance impact of node sharing is very application dependent but very little work has studied its effects within batch systems and for complex parallel application mixes. Administrators therefore typically fear the complexity of running a batch system supporting node sharing and also fear that interference between co-allocated jobs in practice leads to worse performance. This paper focuses on sharing nodes by oversubscribing cores through hyper-threading. We introduce new node sharing strategies for batch systems by deriving extensions to the well-known backfill and first fit algorithms. These strategies have been implemented in the SLURM workload manager and the evaluation is based on NERSC Trinity scientific mini applications. The evaluation of our node sharing strategies shows no overhead when using co-allocation, but an increased computational efficiency of 19% and an increased scheduling efficiency of 25.2% compared to standard node allocation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量