Compiler Assisted Load Balancing on Large Clusters

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.40

Vinita V. Deodhar, Hrushit Parikh, Ada Gavrilovska, S. Pande

{"title":"Compiler Assisted Load Balancing on Large Clusters","authors":"Vinita V. Deodhar, Hrushit Parikh, Ada Gavrilovska, S. Pande","doi":"10.1109/PACT.2015.40","DOIUrl":null,"url":null,"abstract":"Load balancing of tasks across processing nodes is critical for achieving speed up on large scale clusters. Load balancing schemes typically detect the imbalance and then migrate the load from an overloaded processing node to an idle or lightly loaded processing node and thus, estimation of load critically affects the performance of load balancing schemes. On large scale clusters, the latency of load migration between processing nodes (and the energy) is also a significant overhead and any missteps in load estimation can cause significant migrations and performance losses. Currently, the load estimation is done either by profile or feedback driven approaches in sophisticated systems such as Charm++, but such approaches must be re-thought in light of some workloads such as Adaptive Mesh Refinement (AMR) and multiscale physics where the load variations could be quite dynamic and rapid. In this work we propose a compiler based framework which performs precise prediction of the forthcoming workload. The compiler driven load prediction technique performs static analysis of a task and derives an expression to predict load of a task and hoists it as early as possible in the control flow of execution. The compiler also inserts corrector expressions at strategic program points which refine the reachability probability of the load as well as its estimation. The predictor and the corrector expressions are evaluated at runtime and the predicted load information is refined as the execution proceeds and is eventually used by load balancer to take efficient migration decisions. We present an implementation of the above in the Rose compiler and the Charm++ parallel programming framework. We demonstrate the effectiveness of the framework on some key benchmarks that exhibit dynamic variations and show how the compiler framework assists load balancing schemes in Charm++ to provide significant gains.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Load balancing of tasks across processing nodes is critical for achieving speed up on large scale clusters. Load balancing schemes typically detect the imbalance and then migrate the load from an overloaded processing node to an idle or lightly loaded processing node and thus, estimation of load critically affects the performance of load balancing schemes. On large scale clusters, the latency of load migration between processing nodes (and the energy) is also a significant overhead and any missteps in load estimation can cause significant migrations and performance losses. Currently, the load estimation is done either by profile or feedback driven approaches in sophisticated systems such as Charm++, but such approaches must be re-thought in light of some workloads such as Adaptive Mesh Refinement (AMR) and multiscale physics where the load variations could be quite dynamic and rapid. In this work we propose a compiler based framework which performs precise prediction of the forthcoming workload. The compiler driven load prediction technique performs static analysis of a task and derives an expression to predict load of a task and hoists it as early as possible in the control flow of execution. The compiler also inserts corrector expressions at strategic program points which refine the reachability probability of the load as well as its estimation. The predictor and the corrector expressions are evaluated at runtime and the predicted load information is refined as the execution proceeds and is eventually used by load balancer to take efficient migration decisions. We present an implementation of the above in the Rose compiler and the Charm++ parallel programming framework. We demonstrate the effectiveness of the framework on some key benchmarks that exhibit dynamic variations and show how the compiler framework assists load balancing schemes in Charm++ to provide significant gains.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型集群上的编译器辅助负载平衡

跨处理节点的任务负载平衡对于实现大规模集群的速度至关重要。负载均衡方案通常检测不平衡，然后将负载从过载的处理节点迁移到空闲或轻负载的处理节点，因此，负载估计对负载均衡方案的性能有重要影响。在大规模集群中，处理节点之间负载迁移的延迟(和能量)也是一项重大开销，负载估计中的任何失误都可能导致重大迁移和性能损失。目前，负载估计是通过轮廓或反馈驱动的方法在复杂的系统，如Charm++中完成的，但这些方法必须重新考虑一些工作负载，如自适应网格细化(AMR)和多尺度物理，其中负载变化可能非常动态和快速。在这项工作中，我们提出了一个基于编译器的框架，它可以精确预测即将到来的工作量。编译器驱动的负载预测技术对任务进行静态分析，推导出一个表达式来预测任务的负载，并在执行控制流中尽早将其提升。编译器还在策略程序点插入校正表达式，以改进负载的可达性概率及其估计。预测器和校正器表达式在运行时计算，预测的负载信息在执行过程中得到细化，并最终被负载平衡器用于做出有效的迁移决策。我们在Rose编译器和charm++并行编程框架中实现了上述功能。我们在展示动态变化的一些关键基准上展示了框架的有效性，并展示了编译器框架如何帮助Charm++中的负载平衡方案提供显著的收益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量

期刊最新文献

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain? AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Scalable Task Scheduling and Synchronization Using Hierarchical Effects Scalable SIMD-Efficient Graph Processing on GPUs