Joshua D. Suetterlein, J. Manzano, A. Márquez, G. Gao
{"title":"On the Marriage of Asynchronous Many Task Runtimes and Big Data: A Glance","authors":"Joshua D. Suetterlein, J. Manzano, A. Márquez, G. Gao","doi":"10.1109/HiPC50609.2020.00037","DOIUrl":null,"url":null,"abstract":"The rise of the accelerator-based architectures and reconfigurable computing have showcased the weakness of software stack toolchains that still maintain a static view of the hardware instead of relying on a symbiotic relationship between static (e.g., compilers) and dynamic tools (e.g., runtimes). In the past decades, this need has given rise to adaptive runtimes with increasingly finer computational tasks. These finer tasks help to take advantage of the hardware by switching out when a long latency operation is encountered (because of the deeper memory hierarchies and new memory technologies that might target streaming instead of random access), thus trading off idle time for unrelated work. Examples of these finer task runtimes are Asynchronous Many Task (AMT) runtimes, in which highly efficient computational graphs run on a variety of hardware. Due to its inherent latency tolerant characteristics, latency-sensitive applications, such as Graph Analytics and Big Data can effectively use these runtimes. This paper aims to present an example of how the careful design of an AMT can exploit the hardware substrate when faced with high latency applications such as the ones given in the Big Data domain. Moreover, with its introspection and adaptive capabilities, we aim to show the power of these runtimes when facing the changing requirements of application workloads. We use the Performance Open Community Runtime (P-OCR) as our vehicle to demonstrate the concepts presented here.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The rise of the accelerator-based architectures and reconfigurable computing have showcased the weakness of software stack toolchains that still maintain a static view of the hardware instead of relying on a symbiotic relationship between static (e.g., compilers) and dynamic tools (e.g., runtimes). In the past decades, this need has given rise to adaptive runtimes with increasingly finer computational tasks. These finer tasks help to take advantage of the hardware by switching out when a long latency operation is encountered (because of the deeper memory hierarchies and new memory technologies that might target streaming instead of random access), thus trading off idle time for unrelated work. Examples of these finer task runtimes are Asynchronous Many Task (AMT) runtimes, in which highly efficient computational graphs run on a variety of hardware. Due to its inherent latency tolerant characteristics, latency-sensitive applications, such as Graph Analytics and Big Data can effectively use these runtimes. This paper aims to present an example of how the careful design of an AMT can exploit the hardware substrate when faced with high latency applications such as the ones given in the Big Data domain. Moreover, with its introspection and adaptive capabilities, we aim to show the power of these runtimes when facing the changing requirements of application workloads. We use the Performance Open Community Runtime (P-OCR) as our vehicle to demonstrate the concepts presented here.
基于加速器的架构和可重构计算的兴起显示了软件堆栈工具链的弱点,这些工具链仍然保持硬件的静态视图,而不是依赖于静态(例如编译器)和动态工具(例如运行时)之间的共生关系。在过去的几十年里,这种需求产生了具有越来越精细的计算任务的自适应运行时。这些更精细的任务在遇到长延迟操作时(因为更深层的内存层次结构和新的内存技术可能针对流访问而不是随机访问)通过切换来帮助利用硬件,从而将空闲时间用于不相关的工作。这些更好的任务运行时的例子是异步多任务(AMT)运行时,其中高效的计算图在各种硬件上运行。由于其固有的延迟容忍特性,延迟敏感的应用程序,如Graph Analytics和Big Data可以有效地使用这些运行时。本文旨在提供一个例子,说明在面对诸如大数据领域中给出的高延迟应用时,AMT的精心设计如何利用硬件基板。此外,通过自省和自适应功能,我们的目标是在面对不断变化的应用程序工作负载需求时展示这些运行时的强大功能。我们使用性能开放社区运行时(Performance Open Community Runtime, P-OCR)作为演示这里介绍的概念的工具。