Experiences with OpenMP, PGI, HMPP and OpenACC Directives on ISO/TTI Kernels

Sayan Ghosh, Terrence Liao, H. Calandra, B. Chapman
{"title":"Experiences with OpenMP, PGI, HMPP and OpenACC Directives on ISO/TTI Kernels","authors":"Sayan Ghosh, Terrence Liao, H. Calandra, B. Chapman","doi":"10.1109/SC.Companion.2012.95","DOIUrl":null,"url":null,"abstract":"GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models - PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5-1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"1 1","pages":"691-700"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.95","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models - PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5-1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有在ISO/TTI内核上使用OpenMP, PGI, HMPP和OpenACC指令的经验
gpu正在逐渐成为高性能计算中无处不在的设备,因为与多核cpu相比,它们能够提高计算密集型算法的每瓦性能。GPU的主要缺点是可用性,因为供应商特定的api与现有的编程语言有很大的不同,并且它需要大量的设备和编程接口知识来优化应用程序。因此,最近越来越多的高级编程模型瞄准gpu来缓解这个问题。高级模型的最终目标是为用户提供一个易于使用的界面,以便将计算密集型的代码部分(内核)卸载到GPU,并根据目标加速器调整代码,从而在减少开发工作量的同时最大化整体性能。在本文中,我们分享了三种著名的基于高级指令的GPU编程模型- PGI, CAPS和OpenACC(来自CAPS和PGI)在Nvidia M2090 GPU上的经验。我们针对各向同性(ISO)/倾斜横向各向同性(TTI)有限差分核分析了它们的性能和可编程性,这些核是油气勘探中用于地下地震成像的逆时偏移(RTM)应用的主要组成部分。当使用上述指令移植到单个GPU时,我们观察到与使用OpenMP优化的多线程CPU实现相比,ISO和TTI内核的性能平均提高了1.5-1.8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High Performance Computing and Networking: Select Proceedings of CHSN 2021 High Quality Real-Time Image-to-Mesh Conversion for Finite Element Simulations Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation Poster: Memory-Conscious Collective I/O for Extreme-Scale HPC Systems Abstract: Virtual Machine Packing Algorithms for Lower Power Consumption
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1