在峰会上执行多任务的表现特征

2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM) Pub Date : 2019-09-08 DOI:10.1109/IPDRM49579.2019.00007

M. Turilli, André Merzky, T. Naughton, W. Elwasif, S. Jha

{"title":"在峰会上执行多任务的表现特征","authors":"M. Turilli, André Merzky, T. Naughton, W. Elwasif, S. Jha","doi":"10.1109/IPDRM49579.2019.00007","DOIUrl":null,"url":null,"abstract":"Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.","PeriodicalId":256149,"journal":{"name":"2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Characterizing the Performance of Executing Many-tasks on Summit\",\"authors\":\"M. Turilli, André Merzky, T. Naughton, W. Elwasif, S. Jha\",\"doi\":\"10.1109/IPDRM49579.2019.00007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.\",\"PeriodicalId\":256149,\"journal\":{\"name\":\"2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)\",\"volume\":\"373 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDRM49579.2019.00007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDRM49579.2019.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

许多科学工作负载由许多任务组成，其中每个任务都是对数据的独立模拟或分析。在异构HPC平台上执行数百万个任务需要可扩展的动态资源管理和多级调度。RADICAL-Pilot (RP)——Pilot抽象的实现，解决了这些挑战，并作为一个有效的运行时系统来执行由许多任务组成的工作负载。在本文中，我们描述了使用RP与JSM和PRRTE在Summit上接口时执行许多任务的性能:RP负责资源管理和获取资源的任务调度;JSM或PRRTE制定计划任务的启动位置。我们的实验提供了RP与JSM和PRRTE集成时的性能下限。具体来说，对于由同构单核、15分钟任务组成的工作负载，我们发现:对于> 0(1000)个任务，PRRTE的可伸缩性优于JSM;prte的开销可以忽略不计;PRRTE支持降低开销影响的优化，并在404个计算节点上执行O(16K)、1核任务时实现63%的资源利用率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Characterizing the Performance of Executing Many-tasks on Summit

Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)

自引率

0.00%

发文量

期刊最新文献

Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast [Title page] Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2 [Copyright notice] Sequential Codelet Model of Program Execution. A Super-Codelet model based on the Hierarchical Turing Machine.