基于任务的数据流编程模型的通用任务依赖管理硬件

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI:10.1109/IPDPS.2017.48

Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero

{"title":"基于任务的数据流编程模型的通用任务依赖管理硬件","authors":"Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero","doi":"10.1109/IPDPS.2017.48","DOIUrl":null,"url":null,"abstract":"Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models\",\"authors\":\"Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero\",\"doi\":\"10.1109/IPDPS.2017.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

基于任务的编程模型(如OpenMP、IntelTBB和omps)提供了表达任务间依赖关系的可能性，从而在运行时驱动它们的执行。当以细粒度任务为目标时，管理这些依赖项会带来明显的开销，降低潜在的速度，甚至会带来性能损失。为了克服这个缺点，我们提出了一个通用的硬件加速器pico++，在时间和精力上有效地管理任务间的依赖关系。我们的设计还包括一个新颖的嵌套任务支持。为此，提出了一种新的硬件/软件协同设计，以克服由于硬件任务依赖管理器中的资源有限而导致具有依赖关系的嵌套任务可能导致系统死锁的事实。在本文中，我们描述了该设计的详细实现，并在带有两个ARM Cortex-A9和一个FPGA的Linux嵌入式系统中使用Picos++评估了基于并行任务的编程模型。研究了实际系统的可扩展性和能耗，并与软件运行时进行了比较。即使在一个只有2个线程的系统中，使用pico++在实际基准测试中最苛刻的并行性下也能带来1.8倍以上的加速提升和40%的能耗节约。事实上，硬件任务依赖管理器应该能够实现更高的加速，并通过更多的线程提供更多的能源节约。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models

Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the possibility of expressing dependences among tasks to drive their execution at runtime. Managing these dependences introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we present a general purpose hardware accelerator, Picos++, to manage the inter-task dependences efficiently in both time and energy. Our design also includes a novel nested task support. To this end, a new hardware/software co-design is presented to overcome the fact that nested tasks with dependences could result in system deadlocks due to the limited amount of resources in hardware task dependence managers. In this paper we describe a detailed implementation of this design and evaluate a parallel task-based programming model using Picos++ in a Linux embedded system with two ARM Cortex-A9 and a FPGA. The scalability and energy consumption of the real system implemented have been studied and compared against a software runtime. Even in a system limited to 2 threads, using Picos++ results in more than 1.8x speedup and 40% of energy savings in the most demanding parallelizations of real benchmarks. As a matter of fact, a hardware task dependence manager should be able to achieve much higher speedup and provide more energy savings with more threads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量