基于截止日期的GPU调度与抢占支持

2018 IEEE Real-Time Systems Symposium (RTSS) Pub Date : 2018-12-01 DOI:10.1109/RTSS.2018.00021

Nicola Capodieci, R. Cavicchioli, M. Bertogna, Aingara Paramakuru

{"title":"基于截止日期的GPU调度与抢占支持","authors":"Nicola Capodieci, R. Cavicchioli, M. Bertogna, Aingara Paramakuru","doi":"10.1109/RTSS.2018.00021","DOIUrl":null,"url":null,"abstract":"Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications (e.g., Deep Neural Network (DNN) inference, sensor fusion, path planning, etc). As these workload-intensive activities are pushed to higher criticality levels, there is a stronger need for more predictable scheduling algorithms that are able to guarantee predictability without overly sacrificing GPU utilization. Unfortunately, the real-rime literature on GPU scheduling mostly considered limited (or null) preemption capabilities, while previous efforts in broader domains were often based on programming models and APIs that were not designed to support the real-rime requirements of recurring workloads. In this paper, we present the design of a prototype real-time scheduler for GPU activities on an embedded System on a Chip (SoC) featuring a cutting edge GPU architecture by NVIDIA adopted in the autonomous driving domain. The scheduler runs as a software partition on top of the NVIDIA hypervisor, and it leverages latest generation architectural features, such as pixel-level preemption and threadlevel preemption. Such a design allowed us to implement and test a preemptive Earliest Deadline First (EDF) scheduler for GPU tasks providing bandwidth isolations by means of a Constant Bandwidth Server (CBS). Our work involved investigating alternative programming models for compute APIs, allowing us to characterize CPU-to-GPU command submission with more detailed scheduling information. A detailed experimental characterization is presented to show the significant schedulability improvement of recurring real-time GPU tasks.","PeriodicalId":294784,"journal":{"name":"2018 IEEE Real-Time Systems Symposium (RTSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":"{\"title\":\"Deadline-Based Scheduling for GPU with Preemption Support\",\"authors\":\"Nicola Capodieci, R. Cavicchioli, M. Bertogna, Aingara Paramakuru\",\"doi\":\"10.1109/RTSS.2018.00021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications (e.g., Deep Neural Network (DNN) inference, sensor fusion, path planning, etc). As these workload-intensive activities are pushed to higher criticality levels, there is a stronger need for more predictable scheduling algorithms that are able to guarantee predictability without overly sacrificing GPU utilization. Unfortunately, the real-rime literature on GPU scheduling mostly considered limited (or null) preemption capabilities, while previous efforts in broader domains were often based on programming models and APIs that were not designed to support the real-rime requirements of recurring workloads. In this paper, we present the design of a prototype real-time scheduler for GPU activities on an embedded System on a Chip (SoC) featuring a cutting edge GPU architecture by NVIDIA adopted in the autonomous driving domain. The scheduler runs as a software partition on top of the NVIDIA hypervisor, and it leverages latest generation architectural features, such as pixel-level preemption and threadlevel preemption. Such a design allowed us to implement and test a preemptive Earliest Deadline First (EDF) scheduler for GPU tasks providing bandwidth isolations by means of a Constant Bandwidth Server (CBS). Our work involved investigating alternative programming models for compute APIs, allowing us to characterize CPU-to-GPU command submission with more detailed scheduling information. A detailed experimental characterization is presented to show the significant schedulability improvement of recurring real-time GPU tasks.\",\"PeriodicalId\":294784,\"journal\":{\"name\":\"2018 IEEE Real-Time Systems Symposium (RTSS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"59\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Real-Time Systems Symposium (RTSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RTSS.2018.00021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTSS.2018.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

摘要

现代汽车级嵌入式计算平台具有高性能图形处理单元(gpu)，以支持下一代自动驾驶应用所需的大规模并行处理能力(例如，深度神经网络(DNN)推理、传感器融合、路径规划等)。由于这些工作负载密集型活动被推到更高的临界级别，因此更需要更具可预测性的调度算法，这些算法能够在不过度牺牲GPU利用率的情况下保证可预测性。不幸的是，关于GPU调度的实时文献大多认为有限(或零)抢占能力，而之前在更广泛领域的努力通常基于编程模型和api，而这些模型和api的设计并不是为了支持重复工作负载的实时需求。在本文中，我们提出了一个基于NVIDIA在自动驾驶领域采用的尖端GPU架构的嵌入式片上系统(SoC)上GPU活动的原型实时调度程序的设计。调度器作为一个软件分区运行在NVIDIA管理程序之上，它利用了最新一代的体系结构特性，比如像素级抢占和线程级抢占。这样的设计允许我们实现和测试一个抢占式的最早截止日期优先(EDF)调度程序，用于通过恒定带宽服务器(CBS)提供带宽隔离的GPU任务。我们的工作涉及研究计算api的替代编程模型，允许我们使用更详细的调度信息来描述cpu到gpu的命令提交。详细的实验表征表明，循环实时GPU任务的可调度性显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deadline-Based Scheduling for GPU with Preemption Support

Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications (e.g., Deep Neural Network (DNN) inference, sensor fusion, path planning, etc). As these workload-intensive activities are pushed to higher criticality levels, there is a stronger need for more predictable scheduling algorithms that are able to guarantee predictability without overly sacrificing GPU utilization. Unfortunately, the real-rime literature on GPU scheduling mostly considered limited (or null) preemption capabilities, while previous efforts in broader domains were often based on programming models and APIs that were not designed to support the real-rime requirements of recurring workloads. In this paper, we present the design of a prototype real-time scheduler for GPU activities on an embedded System on a Chip (SoC) featuring a cutting edge GPU architecture by NVIDIA adopted in the autonomous driving domain. The scheduler runs as a software partition on top of the NVIDIA hypervisor, and it leverages latest generation architectural features, such as pixel-level preemption and threadlevel preemption. Such a design allowed us to implement and test a preemptive Earliest Deadline First (EDF) scheduler for GPU tasks providing bandwidth isolations by means of a Constant Bandwidth Server (CBS). Our work involved investigating alternative programming models for compute APIs, allowing us to characterize CPU-to-GPU command submission with more detailed scheduling information. A detailed experimental characterization is presented to show the significant schedulability improvement of recurring real-time GPU tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Real-Time Systems Symposium (RTSS)

自引率

0.00%

发文量