Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications

Michael Knobloch, B. Mohr
{"title":"Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications","authors":"Michael Knobloch, B. Mohr","doi":"10.14529/jsfi200105","DOIUrl":null,"url":null,"abstract":"General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales. In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"94 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/jsfi200105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales. In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPU计算工具-异构HPC应用程序的调试和性能分析
通用gpu现在在高端超级计算中无处不在。除了一个(基于ARM处理器的日本Fugaku系统)之外,所有宣布的(预)百亿亿级系统都包含大量gpu,这些gpu提供了这些系统的大部分性能。因此,对于使用高端HPC系统的应用程序开发人员来说,GPU编程将是必要的。然而,与传统的HPC应用程序开发相比,高效地编程gpu是一项更加艰巨的任务。对于包含数千个gpu的大型系统来说,这一点更加明显。编排这样一个系统的所有资源给开发人员带来了巨大的挑战。幸运的是,有一个丰富的工具生态系统可以在各种规模的GPU应用程序的每个开发步骤中帮助开发人员。在本文中,我们概述了这些工具并讨论了它们的功能。我们从不同GPU编程模型的概述开始,从低级的CUDA基于pragma的模型(如OpenACC)到高级的方法(如Kokkos)。我们讨论了它们各自的工具接口,作为工具获取GPU上内核执行信息的主要方法。本文主要关注两类工具:调试器和性能分析工具。调试器帮助开发人员识别CPU和GPU端以及两者相互作用中的问题。一旦应用程序正确运行,就可以使用性能分析工具来查明代码执行中的瓶颈,并帮助提高整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Supercomputer-Based Modeling System for Short-Term Prediction of Urban Surface Air Quality River Routing in the INM RAS-MSU Land Surface Model: Numerical Scheme and Parallel Implementation on Hybrid Supercomputers Data Assimilation by Neural Network for Ocean Circulation: Parallel Implementation Multistage Iterative Method to Tackle Inverse Problems of Wave Tomography Machine Learning Approaches to Extreme Weather Events Forecast in Urban Areas: Challenges and Initial Results
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1