Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-05-21 DOI:10.1109/HPEC.2019.8916466

Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, N. Santhi, S. Eidenbenz

{"title":"Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs","authors":"Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, N. Santhi, S. Eidenbenz","doi":"10.1109/HPEC.2019.8916466","DOIUrl":null,"url":null,"abstract":"The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce a very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level. Furthermore, we show the impact of the various optimizations the CUDA compiler can perform over the various latencies. We perform our evaluation on seven different high-end NVIDIA GPUs from five different generations/architectures: Kepler, Maxwell, Pascal, Volta, and Turing. The results in this paper can help architects to have an accurate characterization of the latencies of these GPUs, which will help in modeling the hardware accurately. Also, software developers can perform informed optimizations to their applications.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for graphics operations as well as general-purpose computing (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce a very low overhead and portable analysis for exposing the latency of each instruction executing in the GPU pipeline(s) and the access overhead of the various memory hierarchies found in GPUs at the micro-architecture level. Furthermore, we show the impact of the various optimizations the CUDA compiler can perform over the various latencies. We perform our evaluation on seven different high-end NVIDIA GPUs from five different generations/architectures: Kepler, Maxwell, Pascal, Volta, and Turing. The results in this paper can help architects to have an accurate characterization of the latencies of these GPUs, which will help in modeling the hardware accurately. Also, software developers can perform informed optimizations to their applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NVIDIA gpgpu的低开销指令延迟特性

在过去的十年中，计算机系统行业发生了转变，异构计算变得普遍起来。图形处理单元(gpu)现在出现在超级计算机、移动电话和平板电脑中。gpu用于图形操作和通用计算(gpgpu)，以提高计算密集型应用程序的性能。然而，超出供应商提供的未公开特性的百分比并不小。在本文中，我们介绍了一个非常低的开销和可移植的分析，用于暴露GPU管道中执行的每个指令的延迟，以及在微体系结构级别上GPU中发现的各种内存层次的访问开销。此外，我们还展示了CUDA编译器可以在各种延迟上执行的各种优化的影响。我们对来自5代/架构的七种不同的高端NVIDIA gpu进行了评估:Kepler, Maxwell, Pascal, Volta和Turing。本文的结果可以帮助架构师准确地描述这些gpu的延迟，这将有助于准确地建模硬件。此外，软件开发人员可以对他们的应用程序执行明智的优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量