PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores

Biagio Peccerillo, S. Bartolini
{"title":"PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1109/HPCS.2017.109","DOIUrl":null,"url":null,"abstract":"The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PHAST库-为gpu和多核启用单源和高性能代码
多核和gpu等并行异构架构的仿真在编程语言/框架领域提出了新的挑战。模拟器的应用程序需要以一种可以很容易地适应特定架构的方式来表达,有效地针对每个架构进行调整,同时防止由于不均匀的手工优化而引入偏差。最常见的异构编程框架都太低级了,所以我们提出了PHAST,一个针对多核和Nvidia gpu的高级异构c++库。它允许在高抽象级别上编写代码,在允许精细的参数调优的同时达到良好的性能,并且不会屏蔽代码进行低级优化。我们在两种支持的体系结构上对DCT8x8的PHAST进行了评估。在多核上,我们发现PHAST的实现比OpenCL (AMD供应商)的实现快10倍左右,但比OpenCL (Intel供应商)的实现慢4倍左右,这有效地利用了自动向量化。在Nvidia gpu上,PHAST代码的性能比CUDA SDK参考版本高出55.14%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributed Particle-Based Rendering Framework for Large Data Visualization on HPC Environments Practical Implementation of Lattice-Based Program Obfuscators for Point Functions Adaptive Root Cause Analysis for Self-Healing in 5G Networks Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers — Survey & Tutorial ICARO-PAPM: Congestion Management with Selective Queue Power-Gating
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1