Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views

Michael Haidl, Michel Steuwer, H. Dirks, Tim Humernbrum, S. Gorlatch
{"title":"Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views","authors":"Michael Haidl, Michel Steuwer, H. Dirks, Tim Humernbrum, S. Gorlatch","doi":"10.1145/3026937.3026942","DOIUrl":null,"url":null,"abstract":"In this paper, we advocate a composable approach to programming systems with Graphics Processing Units (GPU): programs are developed as compositions of generic, reusable patterns. Current GPU programming approaches either rely on low-level, monolithic code without patterns (CUDA and OpenCL), which achieves high performance at the cost of cumbersome and error-prone programming, or they improve the programmability by using pattern-based abstractions (e.g., Thrust) but pay a performance penalty due to inefficient implementations of pattern composition. We develop an API for GPUs based programming on C++ with STL-style patterns and its compiler-based implementation. Our API gives the application developers the native C++ means (views and actions) to specify precisely which pattern compositions should be automatically fused during code generation into a single efficient GPU kernel, thereby ensuring a high target performance. We implement our approach by extending the range-v3 library which is currently being developed for the forthcoming C++ standards. The composable programming in our approach is done exclusively in the standard C++14, with STL algorithms used as patterns which we re-implemented in parallel for GPU. Our compiler implementation is based on the LLVM and Clang frameworks, and we use advanced multi-stage programming techniques for aggressive runtime optimizations. We experimentally evaluate our approach using a set of benchmark applications and a real-world case study from the area of image processing. Our codes achieve performance competitive with CUDA monolithic implementations, and we outperform pattern-based codes written using Nvidia's Thrust.","PeriodicalId":161677,"journal":{"name":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"269 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3026937.3026942","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this paper, we advocate a composable approach to programming systems with Graphics Processing Units (GPU): programs are developed as compositions of generic, reusable patterns. Current GPU programming approaches either rely on low-level, monolithic code without patterns (CUDA and OpenCL), which achieves high performance at the cost of cumbersome and error-prone programming, or they improve the programmability by using pattern-based abstractions (e.g., Thrust) but pay a performance penalty due to inefficient implementations of pattern composition. We develop an API for GPUs based programming on C++ with STL-style patterns and its compiler-based implementation. Our API gives the application developers the native C++ means (views and actions) to specify precisely which pattern compositions should be automatically fused during code generation into a single efficient GPU kernel, thereby ensuring a high target performance. We implement our approach by extending the range-v3 library which is currently being developed for the forthcoming C++ standards. The composable programming in our approach is done exclusively in the standard C++14, with STL algorithms used as patterns which we re-implemented in parallel for GPU. Our compiler implementation is based on the LLVM and Clang frameworks, and we use advanced multi-stage programming techniques for aggressive runtime optimizations. We experimentally evaluate our approach using a set of benchmark applications and a real-world case study from the area of image processing. Our codes achieve performance competitive with CUDA monolithic implementations, and we outperform pattern-based codes written using Nvidia's Thrust.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向可组合GPU编程:用动态动作和惰性视图编程GPU
在本文中,我们提倡使用图形处理单元(GPU)编程系统的可组合方法:将程序开发为通用的可重用模式的组合。当前的GPU编程方法要么依赖于低级的、没有模式的单片代码(CUDA和OpenCL),以繁琐和易出错的编程为代价来实现高性能,要么通过使用基于模式的抽象(例如,Thrust)来提高可编程性,但由于模式组合的低效实现而付出性能损失。我们开发了一个基于stl风格的c++编程的gpu API及其基于编译器的实现。我们的API为应用程序开发人员提供了本地c++方法(视图和操作),以精确地指定在代码生成过程中应该自动将哪些模式组合融合到单个高效的GPU内核中,从而确保高目标性能。我们通过扩展range-v3库来实现我们的方法,该库目前正在为即将到来的c++标准开发。我们的方法中的可组合编程完全在标准c++ 14中完成,使用STL算法作为模式,我们为GPU并行重新实现。我们的编译器实现基于LLVM和Clang框架,我们使用先进的多阶段编程技术进行积极的运行时优化。我们使用一组基准应用程序和来自图像处理领域的实际案例研究对我们的方法进行了实验评估。我们的代码实现了与CUDA单片实现竞争的性能,并且我们优于使用Nvidia的Thrust编写的基于模式的代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs PETRAS: Performance, Energy and Thermal Aware Resource Allocation and Scheduling for Heterogeneous Systems TaskInsight: Understanding Task Schedules Effects on Memory and Performance Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views A high-performance portable abstract interface for explicit SIMD vectorization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1