Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337915

Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser

{"title":"Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems","authors":"Bibek Wagle, Mohammad Alaul Haque Monil, K. Huck, A. Malony, Adrian Serio, Hartmut Kaiser","doi":"10.1145/3337821.3337915","DOIUrl":null,"url":null,"abstract":"As the era of high frequency, single core processors have come to a close, the new paradigm of many core processors has come to dominate. In response to these systems, asynchronous multitasking runtime systems have been developed as a promising solution to efficiently utilize these newly available hardware. Asynchronous multitasking runtime systems work by dividing a problem into a large number of fine grained tasks. However, as the number of tasks created increase, the overheads associated with task creation and management cannot be ignored. Task inlining, a method where the parent thread consumes a child thread, enables the runtime system to achieve the balance between parallelism and its overhead. As largely impacted by different processor architectures, the decision of task inlining is dynamic in nature. In this research, we present adaptive techniques for deciding, at runtime, whether a particular task should be inlined or not. We present two policies, a baseline policy that makes inlining decision based on a fixed threshold and an adaptive policy which decides the threshold dynamically at runtime. We also evaluate and justify the performance of these policies on different processor architectures. To the best of our knowledge, this is the first study of the impacts of adaptive policy at runtime for task inlining in an asynchronous multitasking runtime system on different processor architectures. From experimentation, we find that the baseline policy improves the execution time from 7.61% to 54.09%. Furthermore, the adaptive policy improves over the baseline policy by up to 74%.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

As the era of high frequency, single core processors have come to a close, the new paradigm of many core processors has come to dominate. In response to these systems, asynchronous multitasking runtime systems have been developed as a promising solution to efficiently utilize these newly available hardware. Asynchronous multitasking runtime systems work by dividing a problem into a large number of fine grained tasks. However, as the number of tasks created increase, the overheads associated with task creation and management cannot be ignored. Task inlining, a method where the parent thread consumes a child thread, enables the runtime system to achieve the balance between parallelism and its overhead. As largely impacted by different processor architectures, the decision of task inlining is dynamic in nature. In this research, we present adaptive techniques for deciding, at runtime, whether a particular task should be inlined or not. We present two policies, a baseline policy that makes inlining decision based on a fixed threshold and an adaptive policy which decides the threshold dynamically at runtime. We also evaluate and justify the performance of these policies on different processor architectures. To the best of our knowledge, this is the first study of the impacts of adaptive policy at runtime for task inlining in an asynchronous multitasking runtime system on different processor architectures. From experimentation, we find that the baseline policy improves the execution time from 7.61% to 54.09%. Furthermore, the adaptive policy improves over the baseline policy by up to 74%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异步多任务运行时系统的运行时自适应任务内联

随着高频、单核处理器时代的结束，多核处理器的新范式开始占据主导地位。为了响应这些系统，异步多任务运行时系统作为一种很有前途的解决方案被开发出来，以有效地利用这些新的可用硬件。异步多任务运行时系统通过将问题划分为大量细粒度任务来工作。然而，随着创建的任务数量的增加，与任务创建和管理相关的开销也不容忽视。任务内联是一种父线程消耗子线程的方法，它使运行时系统能够在并行性及其开销之间实现平衡。由于受到不同处理器体系结构的影响，任务内联的决策本质上是动态的。在这项研究中，我们提出了自适应技术来决定，在运行时，一个特定的任务是否应该内联。我们提出了两种策略，一种是基于固定阈值进行内联决策的基线策略，另一种是在运行时动态决定阈值的自适应策略。我们还评估和证明了这些策略在不同处理器架构上的性能。据我们所知，这是第一次研究在不同处理器架构的异步多任务运行时系统中，自适应策略在运行时对任务内联的影响。通过实验，我们发现基线策略将执行时间从7.61%提高到54.09%。此外，自适应策略比基线策略提高了74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment