Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-29 DOI:10.1109/IPDPS.2017.71

Olivier Beaumont, Lionel Eyraud-Dubois, Suraj Kumar

{"title":"Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Suraj Kumar","doi":"10.1109/IPDPS.2017.71","DOIUrl":null,"url":null,"abstract":"In High Performance Computing, heterogeneity is now the normwith specialized accelerators like GPUs providing efficientcomputational power. The added complexity has led to the developmentof task-based runtime systems, which allow complex computations to beexpressed as task graphs, and rely on scheduling algorithms to performload balancing between all resources of the platforms. Developing goodscheduling algorithms, even on a single node, and analyzing them canthus have a very high impact on the performance of current HPCsystems. The special case of two types of resources (namely CPUs andGPUs) is of practical interest. HeteroPrio is such an algorithm whichhas been proposed in the context of fast multipole computations, andthen extended to general task graphs with very interesting results. Inthis paper, we provide a theoretical insight on the performance ofHeteroPrio, by proving approximation bounds compared to the optimalschedule in the case where all tasks are independent and for differentplatform sizes. Interestingly, this shows that spoliation allows toprove approximation ratios for a list scheduling algorithm on twounrelated resources, which is not possible otherwise. We also establishthat almost all our bounds are tight. Additionally, we provide anexperimental evaluation of HeteroPrio on real task graphs from denselinear algebra computation, which highlights the reasons explainingits good practical performance.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

In High Performance Computing, heterogeneity is now the normwith specialized accelerators like GPUs providing efficientcomputational power. The added complexity has led to the developmentof task-based runtime systems, which allow complex computations to beexpressed as task graphs, and rely on scheduling algorithms to performload balancing between all resources of the platforms. Developing goodscheduling algorithms, even on a single node, and analyzing them canthus have a very high impact on the performance of current HPCsystems. The special case of two types of resources (namely CPUs andGPUs) is of practical interest. HeteroPrio is such an algorithm whichhas been proposed in the context of fast multipole computations, andthen extended to general task graphs with very interesting results. Inthis paper, we provide a theoretical insight on the performance ofHeteroPrio, by proving approximation bounds compared to the optimalschedule in the case where all tasks are independent and for differentplatform sizes. Interestingly, this shows that spoliation allows toprove approximation ratios for a list scheduling algorithm on twounrelated resources, which is not possible otherwise. We also establishthat almost all our bounds are tight. Additionally, we provide anexperimental evaluation of HeteroPrio on real task graphs from denselinear algebra computation, which highlights the reasons explainingits good practical performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多核和gpu上基于任务的运行时系统快速高效列表调度算法的逼近证明

在高性能计算中，异构现在是标准，像gpu这样的专用加速器提供了高效的计算能力。增加的复杂性导致了基于任务的运行时系统的发展，它允许将复杂的计算表示为任务图，并依赖于调度算法在平台的所有资源之间执行负载平衡。开发好的调度算法，即使是在单个节点上，并对其进行分析，对当前高性能计算系统的性能有很大的影响。两种类型的资源(即cpu和gpu)的特殊情况具有实际意义。HeteroPrio就是这样一种算法，它是在快速多极计算的背景下提出的，然后扩展到一般的任务图，得到了非常有趣的结果。在本文中，我们通过证明在所有任务独立且不同平台大小的情况下与最优调度相比的近似界，提供了对heteroprio性能的理论见解。有趣的是，这表明掠夺允许在两个相关资源上证明列表调度算法的近似比率，这在其他情况下是不可能的。我们还确定了几乎所有的边界都是紧的。此外，我们还从密集线性代数计算的实际任务图上对HeteroPrio进行了实验评估，这突出了解释其良好实际性能的原因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量