基于任务的数据流编程模型的通用任务依赖管理硬件

Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero
{"title":"基于任务的数据流编程模型的通用任务依赖管理硬件","authors":"Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero","doi":"10.1109/IPDPS.2017.48","DOIUrl":null,"url":null,"abstract":"Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models\",\"authors\":\"Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero\",\"doi\":\"10.1109/IPDPS.2017.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

基于任务的编程模型(如OpenMP、IntelTBB和omps)提供了表达任务间依赖关系的可能性,从而在运行时驱动它们的执行。当以细粒度任务为目标时,管理这些依赖项会带来明显的开销,降低潜在的速度,甚至会带来性能损失。为了克服这个缺点,我们提出了一个通用的硬件加速器pico++,在时间和精力上有效地管理任务间的依赖关系。我们的设计还包括一个新颖的嵌套任务支持。为此,提出了一种新的硬件/软件协同设计,以克服由于硬件任务依赖管理器中的资源有限而导致具有依赖关系的嵌套任务可能导致系统死锁的事实。在本文中,我们描述了该设计的详细实现,并在带有两个ARM Cortex-A9和一个FPGA的Linux嵌入式系统中使用Picos++评估了基于并行任务的编程模型。研究了实际系统的可扩展性和能耗,并与软件运行时进行了比较。即使在一个只有2个线程的系统中,使用pico++在实际基准测试中最苛刻的并行性下也能带来1.8倍以上的加速提升和40%的能耗节约。事实上,硬件任务依赖管理器应该能够实现更高的加速,并通过更多的线程提供更多的能源节约。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models
Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL Toucan — A Translator for Communication Tolerant MPI Applications Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs Dynamic Memory-Aware Task-Tree Scheduling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1