Transparently Space Sharing a Multicore Among Multiple Processes

Pub Date : 2016-12-26 DOI:10.1145/3001910

T. Creech, R. Barua

{"title":"Transparently Space Sharing a Multicore Among Multiple Processes","authors":"T. Creech, R. Barua","doi":"10.1145/3001910","DOIUrl":null,"url":null,"abstract":"As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads used as they run, enable sophisticated and flexible resource management. Although many existing applications parallelized for SMPs with parallel runtimes are in fact already malleable, deployed runtime environments provide no interface nor any strategy for intelligently allocating hardware threads or even preventing oversubscription. Prior research methods either depend on profiling applications ahead of time to make good decisions about allocations or do not account for process efficiency at all, leading to poor performance. None of these prior methods have been adapted widely in practice. This article presents the Scheduling and Allocation with Feedback (SCAF) system: a drop-in runtime solution that supports existing malleable applications in making intelligent allocation decisions based on observed efficiency without any changes to semantics, program modification, offline profiling, or even recompilation. Our existing implementation can control most unmodified OpenMP applications. Other malleable threading libraries can also easily be supported with small modifications without requiring application modification or recompilation.\n In this work, we present the SCAF daemon and a SCAF-aware port of the GNU OpenMP runtime. We present a new technique for estimating process efficiency purely at runtime using available hardware counters and demonstrate its effectiveness in aiding allocation decisions.\n We evaluated SCAF using NAS NPB parallel benchmarks on five commodity parallel platforms, enumerating architectural features and their effects on our scheme. We measured the benefit of SCAF in terms of sum of speedups improvement (a common metric for multiprogrammed environments) when running all benchmark pairs concurrently compared to equipartitioning—the best existing competing scheme in the literature. We found that SCAF improves on equipartitioning on four out of five machines, showing a mean improvement factor in sum of speedups of 1.04 to 1.11x for benchmark pairs, depending on the machine, and 1.09x on average.\n Since we are not aware of any widely available tool for equipartitioning, we also compare SCAF against multiprogramming using unmodified OpenMP, which is the only environment available to end users today. SCAF improves on the unmodified OpenMP runtimes for all five machines, with a mean improvement of 1.08 to 2.07x, depending on the machine, and 1.59x on average.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3001910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads used as they run, enable sophisticated and flexible resource management. Although many existing applications parallelized for SMPs with parallel runtimes are in fact already malleable, deployed runtime environments provide no interface nor any strategy for intelligently allocating hardware threads or even preventing oversubscription. Prior research methods either depend on profiling applications ahead of time to make good decisions about allocations or do not account for process efficiency at all, leading to poor performance. None of these prior methods have been adapted widely in practice. This article presents the Scheduling and Allocation with Feedback (SCAF) system: a drop-in runtime solution that supports existing malleable applications in making intelligent allocation decisions based on observed efficiency without any changes to semantics, program modification, offline profiling, or even recompilation. Our existing implementation can control most unmodified OpenMP applications. Other malleable threading libraries can also easily be supported with small modifications without requiring application modification or recompilation. In this work, we present the SCAF daemon and a SCAF-aware port of the GNU OpenMP runtime. We present a new technique for estimating process efficiency purely at runtime using available hardware counters and demonstrate its effectiveness in aiding allocation decisions. We evaluated SCAF using NAS NPB parallel benchmarks on five commodity parallel platforms, enumerating architectural features and their effects on our scheme. We measured the benefit of SCAF in terms of sum of speedups improvement (a common metric for multiprogrammed environments) when running all benchmark pairs concurrently compared to equipartitioning—the best existing competing scheme in the literature. We found that SCAF improves on equipartitioning on four out of five machines, showing a mean improvement factor in sum of speedups of 1.04 to 1.11x for benchmark pairs, depending on the machine, and 1.09x on average. Since we are not aware of any widely available tool for equipartitioning, we also compare SCAF against multiprogramming using unmodified OpenMP, which is the only environment available to end users today. SCAF improves on the unmodified OpenMP runtimes for all five machines, with a mean improvement of 1.08 to 2.07x, depending on the machine, and 1.59x on average.

查看原文

微信好友朋友圈 QQ好友复制链接

在多个进程之间透明地共享多核空间

随着硬件变得越来越并行，可伸缩并行软件的可用性得到提高，管理多个多线程应用程序(进程)的问题变得非常重要。可塑进程可以在运行时改变所使用的线程数量，从而实现复杂而灵活的资源管理。尽管为具有并行运行时的smp并行化的许多现有应用程序实际上已经具有延展性，但部署的运行时环境既没有提供接口，也没有提供任何策略来智能地分配硬件线程，甚至防止过度订阅。先前的研究方法要么依赖于提前分析应用程序以做出关于分配的正确决策，要么根本不考虑流程效率，从而导致较差的性能。这些先前的方法都没有在实践中得到广泛的应用。本文介绍了带反馈的调度和分配(SCAF)系统:一个插入式运行时解决方案，它支持现有的可扩展应用程序根据观察到的效率做出智能分配决策，而无需对语义、程序修改、脱机分析甚至重新编译进行任何更改。我们现有的实现可以控制大多数未经修改的OpenMP应用程序。其他具有延展性的线程库也可以通过少量修改来支持，而不需要修改应用程序或重新编译。在本文中，我们介绍了SCAF守护进程和GNU OpenMP运行时的一个支持SCAF的端口。我们提出了一种使用可用硬件计数器在运行时评估进程效率的新技术，并证明了它在帮助分配决策方面的有效性。我们在五个商品并行平台上使用NAS NPB并行基准来评估SCAF，列举了架构特征及其对我们方案的影响。当同时运行所有基准对时，我们根据加速改进的总和(多程序环境的常用指标)来衡量SCAF的好处，并将其与均衡(文献中现有的最佳竞争方案)进行比较。我们发现SCAF在5台机器中的4台机器上的均分方面得到了改进，根据机器的不同，基准对的加速总和的平均改进系数为1.04到1.11倍，平均为1.09倍。由于我们不知道有任何广泛可用的均分工具，我们还比较了SCAF与使用未修改的OpenMP的多路编程，OpenMP是目前最终用户可用的唯一环境。SCAF在未修改的OpenMP运行时上对所有五台机器进行了改进，根据机器的不同，平均改进了1.08到2.07倍，平均改进了1.59倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助