Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures

Biagio Peccerillo, S. Bartolini
{"title":"Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1145/3303084.3309496","DOIUrl":null,"url":null,"abstract":"Nowadays, the majority of desktop, mobile, and embedded devices in the consumer and industrial markets are heterogeneous, as they contain at least multi-core CPU and GPU resources in the same system. However, exploiting the performance and energy-efficiency of these diverse processing elements does not come for free from a software point of view: programmers need to a) code each activity through the specific approaches, libraries, and frameworks suitable for their target architecture (e.g., CPUs and GPUs) along with the orchestration of such heterogeneous execution, and b) decide the distribution of sequential and parallel activities towards the different parallel hardware resources available. Current frameworks typically provide either low-abstraction-level target-specific and/or generic but not high-performance interfaces, which complicate the exploration of different task assignments, with DAG1 precedence relationship, to the available heterogeneous resources. To enable this, tasks would typically need to be coded one time for each target architecture due to the profound differences in their programming. In this work, we include the support of tasks and DAGs of data-parallel tasks within the single-source PHAST library, which currently supports both multi-core CPUs and NVIDIA GPUs, so that tasks are coded in a target-agnostic fashion and their targeting to multi-core or GPU architectures is automatic and efficient. The integration of this coding approach with tasks can help to postpone the choice of the execution platform for each task up to the testing, or even to the runtime, phase. Finally, we demonstrate the effects of this approach in the case of a sample image pipeline benchmark from the computer vision domain. We compare our implementation to a SYCL implementation from a productivity point of view. Also, we show that various task assignments can be seamlessly explored by implementing both the PEFT2 mapping technique along with an exhaustive search in the mapping space.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3303084.3309496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Nowadays, the majority of desktop, mobile, and embedded devices in the consumer and industrial markets are heterogeneous, as they contain at least multi-core CPU and GPU resources in the same system. However, exploiting the performance and energy-efficiency of these diverse processing elements does not come for free from a software point of view: programmers need to a) code each activity through the specific approaches, libraries, and frameworks suitable for their target architecture (e.g., CPUs and GPUs) along with the orchestration of such heterogeneous execution, and b) decide the distribution of sequential and parallel activities towards the different parallel hardware resources available. Current frameworks typically provide either low-abstraction-level target-specific and/or generic but not high-performance interfaces, which complicate the exploration of different task assignments, with DAG1 precedence relationship, to the available heterogeneous resources. To enable this, tasks would typically need to be coded one time for each target architecture due to the profound differences in their programming. In this work, we include the support of tasks and DAGs of data-parallel tasks within the single-source PHAST library, which currently supports both multi-core CPUs and NVIDIA GPUs, so that tasks are coded in a target-agnostic fashion and their targeting to multi-core or GPU architectures is automatic and efficient. The integration of this coding approach with tasks can help to postpone the choice of the execution platform for each task up to the testing, or even to the runtime, phase. Finally, we demonstrate the effects of this approach in the case of a sample image pipeline benchmark from the computer vision domain. We compare our implementation to a SYCL implementation from a productivity point of view. Also, we show that various task assignments can be seamlessly explored by implementing both the PEFT2 mapping technique along with an exhaustive search in the mapping space.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单源PHAST库中的Task-DAG支持:在异构架构中为cpu和gpu灵活分配任务
如今,消费和工业市场中的大多数桌面、移动和嵌入式设备都是异构的,因为它们在同一系统中至少包含多核CPU和GPU资源。然而,从软件的角度来看,利用这些不同处理元素的性能和能源效率并不是免费的:程序员需要a)通过适合其目标体系结构(例如,cpu和gpu)的特定方法、库和框架对每个活动进行编码,并对此类异构执行进行编排;b)决定顺序和并行活动的分布,以适应不同的可用并行硬件资源。当前的框架通常提供低抽象级别的特定于目标和/或通用但不高性能的接口,这会使探索不同的任务分配(具有DAG1优先关系)到可用的异构资源变得复杂。为了实现这一点,通常需要为每个目标体系结构编写一次任务,因为它们的编程存在很大的差异。在这项工作中,我们在单源PHAST库中包括对数据并行任务的任务和dag的支持,该库目前支持多核cpu和NVIDIA GPU,因此任务以目标不可知的方式编码,并且它们针对多核或GPU架构的目标是自动且高效的。这种编码方法与任务的集成可以帮助将每个任务的执行平台的选择推迟到测试阶段,甚至推迟到运行时阶段。最后,我们在计算机视觉领域的样本图像管道基准测试中演示了该方法的效果。我们从生产力的角度将我们的实现与SYCL实现进行比较。此外,我们还展示了通过实现PEFT2映射技术以及映射空间中的穷举搜索,可以无缝地探索各种任务分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Formal Verification through Combinatorial Topology: the CAS-Extended Model Wait-free Dynamic Transactions for Linked Data Structures Deciphering Predictive Schedulers for Heterogeneous-ISA Multicore Architectures LiTM: A Lightweight Deterministic Software Transactional Memory System Process Barrier for Predictable and Repeatable Concurrent Execution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1