Exploring AMD GPU scheduling details by experimenting with “worst practices”

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Real-Time Systems Pub Date : 2022-03-23 DOI:10.1007/s11241-022-09381-y

Nathan Otterness, James H. Anderson

{"title":"Exploring AMD GPU scheduling details by experimenting with “worst practices”","authors":"Nathan Otterness, James H. Anderson","doi":"10.1007/s11241-022-09381-y","DOIUrl":null,"url":null,"abstract":"<p>Graphics processing units (GPUs) have been the target of a significant body of recent real-time research, but research is often hampered by the “black box” nature of GPU hardware and software. Now that one GPU manufacturer, AMD, has embraced an open-source software stack, one may expect an increased amount of real-time research to use AMD GPUs. Reality, however, is more complicated. Without understanding where internal details may differ, researchers have no basis for assuming that observations made using NVIDIA GPUs will continue to hold for AMD GPUs. Additionally, the openness of AMD’s software does not mean that their scheduling behavior is obvious, especially due to sparse, scattered documentation. In this paper, we gather the disparate pieces of documentation into a single coherent source that provides an end-to-end description of how compute work is scheduled on AMD GPUs. In doing so, we start with a concrete demonstration of how incorrect management triggers extreme worst-case behavior in shared AMD GPUs. Subsequently, we explain the internal scheduling rules for AMD GPUs, how they led to the “worst practices,” and how to correctly manage some of the most performance-critical factors in AMD GPU sharing.</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"216 1‐2","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2022-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-022-09381-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Graphics processing units (GPUs) have been the target of a significant body of recent real-time research, but research is often hampered by the “black box” nature of GPU hardware and software. Now that one GPU manufacturer, AMD, has embraced an open-source software stack, one may expect an increased amount of real-time research to use AMD GPUs. Reality, however, is more complicated. Without understanding where internal details may differ, researchers have no basis for assuming that observations made using NVIDIA GPUs will continue to hold for AMD GPUs. Additionally, the openness of AMD’s software does not mean that their scheduling behavior is obvious, especially due to sparse, scattered documentation. In this paper, we gather the disparate pieces of documentation into a single coherent source that provides an end-to-end description of how compute work is scheduled on AMD GPUs. In doing so, we start with a concrete demonstration of how incorrect management triggers extreme worst-case behavior in shared AMD GPUs. Subsequently, we explain the internal scheduling rules for AMD GPUs, how they led to the “worst practices,” and how to correctly manage some of the most performance-critical factors in AMD GPU sharing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过试验“最坏的做法”探索AMD GPU调度细节

图形处理单元(GPU)一直是近期实时研究的重要目标，但研究往往受到GPU硬件和软件的“黑盒子”性质的阻碍。现在，GPU制造商AMD已经接受了开源软件栈，人们可以期待越来越多的实时研究使用AMD GPU。然而，现实要复杂得多。在不了解内部细节可能不同的情况下，研究人员没有根据假设使用NVIDIA gpu的观察结果将继续适用于AMD gpu。此外，AMD的软件的开放性并不意味着他们的调度行为是明显的，特别是由于稀疏，分散的文档。在本文中，我们将不同的文档片段收集到一个统一的来源中，该来源提供了如何在AMD gpu上调度计算工作的端到端描述。在此过程中，我们首先具体演示了不正确的管理如何在共享AMD gpu中触发极端最坏情况的行为。随后，我们解释了AMD GPU的内部调度规则，它们如何导致“最糟糕的做法”，以及如何正确管理AMD GPU共享中一些最关键的性能因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Real-Time Systems 工程技术-计算机：理论方法

CiteScore

2.90

自引率

7.70%

发文量

审稿时长

6 months

期刊介绍： Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.