Runtime Pipeline Scheduling System for Heterogeneous Architectures

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI:10.1145/2616498.2616547

Julio C. Olaya, R. Romero

{"title":"Runtime Pipeline Scheduling System for Heterogeneous Architectures","authors":"Julio C. Olaya, R. Romero","doi":"10.1145/2616498.2616547","DOIUrl":null,"url":null,"abstract":"Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"8 1","pages":"45:1-45:7"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构体系结构的运行时管道调度系统

异构体系结构可以通过计算密集型、数据并行操作提高应用程序的性能。即使这些体系结构可以减少应用程序的执行时间，由于中央处理器内核和图形处理器内核的内存层次结构是分开的，因此也有机会进一步提高性能。在异构架构上运行的应用程序必须在GPU全局内存中分配空间，复制输入数据，调用内核，并将结果复制到CPU内存中。该方案不重叠内存间数据传输和GPU计算，从而增加了应用程序的执行时间。本研究提出了一种具有运行时管道系统的软件架构，用于GPU输入/输出调度，该系统充当GPU计算应用程序和物理设备之间的双向接口。该系统的主要目的是通过利用设备I/O和计算重叠来减少处理器-内存性能差距的影响。使用应用程序基准测试的评估显示，相对于基线，非流式GPU执行，处理速度提高了2倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

自引率

0.00%

发文量

期刊最新文献

CloudBridge: a Simple Cross-Cloud Python Library. pbsacct: A Workload Analysis System for PBS-Based HPC Systems ECSS Experience: Particle Tracing Reinvented Fast, Low-Memory Algorithm for Construction of Nanosecond Level Snapshots of Financial Markets Benchmarking SSD-Based Lustre File System Configurations