Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications

Supercomput. Front. Innov. Pub Date : 2020-03-01 DOI:10.14529/jsfi200105

Michael Knobloch, B. Mohr

{"title":"Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications","authors":"Michael Knobloch, B. Mohr","doi":"10.14529/jsfi200105","DOIUrl":null,"url":null,"abstract":"General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales. In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"94 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/jsfi200105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales. In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPU计算工具-异构HPC应用程序的调试和性能分析

通用gpu现在在高端超级计算中无处不在。除了一个(基于ARM处理器的日本Fugaku系统)之外，所有宣布的(预)百亿亿级系统都包含大量gpu，这些gpu提供了这些系统的大部分性能。因此，对于使用高端HPC系统的应用程序开发人员来说，GPU编程将是必要的。然而，与传统的HPC应用程序开发相比，高效地编程gpu是一项更加艰巨的任务。对于包含数千个gpu的大型系统来说，这一点更加明显。编排这样一个系统的所有资源给开发人员带来了巨大的挑战。幸运的是，有一个丰富的工具生态系统可以在各种规模的GPU应用程序的每个开发步骤中帮助开发人员。在本文中，我们概述了这些工具并讨论了它们的功能。我们从不同GPU编程模型的概述开始，从低级的CUDA基于pragma的模型(如OpenACC)到高级的方法(如Kokkos)。我们讨论了它们各自的工具接口，作为工具获取GPU上内核执行信息的主要方法。本文主要关注两类工具:调试器和性能分析工具。调试器帮助开发人员识别CPU和GPU端以及两者相互作用中的问题。一旦应用程序正确运行，就可以使用性能分析工具来查明代码执行中的瓶颈，并帮助提高整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Supercomput. Front. Innov.

自引率

0.00%

发文量