用于灵活拆分和调度渲染任务的 GPU 例行程序

IF 7.8 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Graphics Pub Date : 2024-11-19 DOI:10.1145/3687766

Shaokun Zheng, Xin Chen, Zhong Shi, Ling-Qi Yan, Kun Xu

{"title":"用于灵活拆分和调度渲染任务的 GPU 例行程序","authors":"Shaokun Zheng, Xin Chen, Zhong Shi, Ling-Qi Yan, Kun Xu","doi":"10.1145/3687766","DOIUrl":null,"url":null,"abstract":"We introduce coroutines into GPU kernel programming, providing an automated solution for flexible splitting and scheduling of rendering tasks. This approach addresses a prevalent challenge in harnessing the power of modern GPUs for complex, imbalanced graphics workloads like path tracing. Usually, to accommodate the SIMT execution model and latency-hiding architecture, developers have to decompose a monolithic mega-kernel into smaller sub-tasks for improved thread coherence and reduced register pressure. However, involving the handling of intricate nested control flows and numerous interdependent program states, this process can be exceedingly tedious and error-prone when performed manually. Coroutines, a building block for asynchronous programming in many high-level CPU languages, exhibit untapped potential for restructuring GPU kernels due to their versatility in control representation. By extending Luisa [Zheng et al. 2022], we implement an asymmetric, stackless coroutine model with programming language support and multiple built-in schedulers for modern GPUs. To showcase the effectiveness of our model and implementation, we examine them in different application scenarios, including path tracing, SDF rendering, and incorporation with custom passes.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"22 1","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPU Coroutines for Flexible Splitting and Scheduling of Rendering Tasks\",\"authors\":\"Shaokun Zheng, Xin Chen, Zhong Shi, Ling-Qi Yan, Kun Xu\",\"doi\":\"10.1145/3687766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce coroutines into GPU kernel programming, providing an automated solution for flexible splitting and scheduling of rendering tasks. This approach addresses a prevalent challenge in harnessing the power of modern GPUs for complex, imbalanced graphics workloads like path tracing. Usually, to accommodate the SIMT execution model and latency-hiding architecture, developers have to decompose a monolithic mega-kernel into smaller sub-tasks for improved thread coherence and reduced register pressure. However, involving the handling of intricate nested control flows and numerous interdependent program states, this process can be exceedingly tedious and error-prone when performed manually. Coroutines, a building block for asynchronous programming in many high-level CPU languages, exhibit untapped potential for restructuring GPU kernels due to their versatility in control representation. By extending Luisa [Zheng et al. 2022], we implement an asymmetric, stackless coroutine model with programming language support and multiple built-in schedulers for modern GPUs. To showcase the effectiveness of our model and implementation, we examine them in different application scenarios, including path tracing, SDF rendering, and incorporation with custom passes.\",\"PeriodicalId\":50913,\"journal\":{\"name\":\"ACM Transactions on Graphics\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3687766\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687766","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

我们在 GPU 内核编程中引入了 coroutines，为灵活拆分和调度渲染任务提供了自动化解决方案。这种方法解决了在利用现代 GPU 的强大功能处理复杂、不平衡的图形工作负载（如路径跟踪）时面临的普遍挑战。通常，为了适应 SIMT 执行模型和延迟隐藏架构，开发人员必须将单片巨型内核分解成较小的子任务，以提高线程一致性并减少寄存器压力。然而，由于需要处理错综复杂的嵌套控制流和大量相互依赖的程序状态，这一过程非常繁琐，而且手动执行时容易出错。Coroutines是许多高级CPU语言中异步编程的构件，由于其在控制表示方面的多样性，它在重构GPU内核方面展现出了尚未开发的潜力。通过扩展 Luisa [Zheng 等人，2022 年]，我们为现代 GPU 实现了一个非对称、无堆栈的 CORUTINE 模型，该模型支持编程语言和多个内置调度程序。为了展示我们的模型和实现的有效性，我们在不同的应用场景中对其进行了检验，包括路径追踪、SDF渲染以及与自定义通行证的结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPU Coroutines for Flexible Splitting and Scheduling of Rendering Tasks

We introduce coroutines into GPU kernel programming, providing an automated solution for flexible splitting and scheduling of rendering tasks. This approach addresses a prevalent challenge in harnessing the power of modern GPUs for complex, imbalanced graphics workloads like path tracing. Usually, to accommodate the SIMT execution model and latency-hiding architecture, developers have to decompose a monolithic mega-kernel into smaller sub-tasks for improved thread coherence and reduced register pressure. However, involving the handling of intricate nested control flows and numerous interdependent program states, this process can be exceedingly tedious and error-prone when performed manually. Coroutines, a building block for asynchronous programming in many high-level CPU languages, exhibit untapped potential for restructuring GPU kernels due to their versatility in control representation. By extending Luisa [Zheng et al. 2022], we implement an asymmetric, stackless coroutine model with programming language support and multiple built-in schedulers for modern GPUs. To showcase the effectiveness of our model and implementation, we examine them in different application scenarios, including path tracing, SDF rendering, and incorporation with custom passes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助