Multi-Level Load Balancing with an Integrated Runtime Approach

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI:10.1109/CCGRID.2018.00018

Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé

{"title":"Multi-Level Load Balancing with an Integrated Runtime Approach","authors":"Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé","doi":"10.1109/CCGRID.2018.00018","DOIUrl":null,"url":null,"abstract":"The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

集成运行时方法的多级负载平衡

最近每个芯片的核心数量不断增加的趋势导致了大量的节点上并行性。这些高核数导致硬件可变性，从而导致不平衡。应用程序也变得越来越复杂，导致动态负载不平衡。任何类型的负载不平衡都可能导致性能和系统利用率的损失。我们解决了处理瞬时和持久负载不平衡的挑战，同时保持低开销的局部性。在本文中，我们提出了一个集成的运行时系统，该系统将Charm++分布式编程模型与并发任务相结合，以减轻共享内存地址空间内和跨地址空间的负载不平衡。它利用基于负载测量的周期性工作分配给核心，并结合用户创建的任务来处理负载不平衡。我们将OpenMP与Charm++集成在一起，通过OpenMP的并行循环结构创建潜在任务。这也可以通过自适应MPI实现提供给MPI应用程序。我们将在三个应用程序上演示我们的工作带来的好处。我们显示Lassen在Cori和Theta上分别提高了29.6%和46.5%。我们还演示了在Theta上提高25.7%的Charm++应用程序ChaNGa以及使用自适应MPI的MPI代理应用程序Kripke的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量