Automatic Parallelization to Asynchronous Task-Based Runtimes Through a Generic Runtime Layer

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916294

Charles Jin, M. Baskaran, Benoît Meister, J. Springer

{"title":"Automatic Parallelization to Asynchronous Task-Based Runtimes Through a Generic Runtime Layer","authors":"Charles Jin, M. Baskaran, Benoît Meister, J. Springer","doi":"10.1109/HPEC.2019.8916294","DOIUrl":null,"url":null,"abstract":"With the end of Moore’s law, asynchronous task-based parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability.This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0* (OpenMP) and 9.5* (Legion) using 64 threads.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the end of Moore’s law, asynchronous task-based parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability.This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0* (OpenMP) and 9.5* (Legion) using 64 threads.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过通用运行时层实现异步任务运行时的自动并行化

随着摩尔定律的终结，基于异步任务的并行性作为一种并行编程范式得到了越来越多的支持，运行时系统提供了动态负载平衡、局部性和可伸缩性等优势。然而，近年来这类编程系统的数量激增，每个系统都有不同的性能权衡和运行时语义。因此，在这些系统之上开发应用程序不仅需要应用程序专业知识，还需要对运行时非常熟悉，这加剧了可编程性和可移植性的长期问题。这项工作为这一不断发展的景观做出了三个主要贡献。首先，我们扩展了一个多面体优化编译器，使用技术提取基于任务的并行性和数据管理，用于广泛的基于异步任务的运行时。其次，我们为基于异步任务的系统引入了一个通用的运行时层，该层具有默认情况下稀疏和平铺的数据和任务表示，可作为编译器后端的抽象目标。最后，我们使用OpenMP和Legion实现了这个通用层，展示了通用层的灵活性和可行性，并为基于异步任务的运行时的自动并行化提供了端到端路径。使用从深度学习到科学内核的广泛应用程序，我们使用64个线程获得23.0* (OpenMP)和9.5* (Legion)的几何平均加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量