Friendly barriers: efficient work-stealing with return barriers

International Conference on Virtual Execution Environments Pub Date : 2014-03-01 DOI:10.1145/2576195.2576207

Vivek Kumar, S. Blackburn, D. Grove

{"title":"Friendly barriers: efficient work-stealing with return barriers","authors":"Vivek Kumar, S. Blackburn, D. Grove","doi":"10.1145/2576195.2576207","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy that a runtime may use to keep otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Recent work identified sequential overheads of work-stealing, those that occur even when no stealing takes place, as a significant source of overhead. That work was able to reduce sequential overheads to just 15%.\n In this work, we turn to dynamic overheads, those that occur each time a steal takes place. We show that the dynamic overhead is dominated by introspection of the victim's stack when a steal takes place. We exploit the idea of a low overhead return barrier to reduce the dynamic overhead by approximately half, resulting in total performance improvements of as much as 20%. Because, unlike prior work, we attack the overheads directly due to stealing and therefore attack the overheads that grow as parallelism grows, we improve the scalability of work-stealing applications. This result is complementary to recent work addressing the sequential overheads of work-stealing. This work therefore substantially relieves work-stealing of the increasing pressure due to increasing intra-node hardware parallelism.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"48 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2576195.2576207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy that a runtime may use to keep otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Recent work identified sequential overheads of work-stealing, those that occur even when no stealing takes place, as a significant source of overhead. That work was able to reduce sequential overheads to just 15%. In this work, we turn to dynamic overheads, those that occur each time a steal takes place. We show that the dynamic overhead is dominated by introspection of the victim's stack when a steal takes place. We exploit the idea of a low overhead return barrier to reduce the dynamic overhead by approximately half, resulting in total performance improvements of as much as 20%. Because, unlike prior work, we attack the overheads directly due to stealing and therefore attack the overheads that grow as parallelism grows, we improve the scalability of work-stealing applications. This result is complementary to recent work addressing the sequential overheads of work-stealing. This work therefore substantially relieves work-stealing of the increasing pressure due to increasing intra-node hardware parallelism.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

友好障碍:有效的工作窃取与返回障碍

本文解决了在托管运行时中有效支持并行性的问题。在并行硬件上开发软件并行性的一种流行方法是任务并行性，程序员明确地识别潜在的并行性，然后运行时调度工作。工作窃取是一种很有前途的调度策略，运行时可以使用它来保持空闲硬件的繁忙，同时减轻过载硬件的负担。然而，窃取工作带来了大量的管理费用。最近的研究发现，即使在没有窃取工作的情况下，也会发生偷工作的连续开销，这是开销的重要来源。这项工作能够将连续开销减少到15%。在本工作中，我们转向动态开销，即每次偷盗发生时发生的开销。我们展示了当偷取发生时，动态开销主要由受害者堆栈的自省控制。我们利用低开销返回屏障的想法，将动态开销减少了大约一半，从而使总性能提高了20%。因为，与之前的工作不同，我们直接攻击了由于窃取而导致的开销，因此攻击了随着并行性增长而增长的开销，我们提高了工作窃取应用程序的可伸缩性。这一结果是对最近解决窃取工作的连续开销的工作的补充。因此，这项工作大大减轻了由于节点内硬件并行性增加而增加的工作压力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Conference on Virtual Execution Environments

自引率

0.00%

发文量

期刊最新文献

Shrinking the hypervisor one subsystem at a time: a userspace packet switch for virtual machines A fast abstract syntax tree interpreter for R DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend Ginseng: market-driven memory allocation Tesseract: reconciling guest I/O and hypervisor swapping in a VM