边缘设备推理流水线的多流调度--一种 DRL 方法

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-07-11 DOI:10.1145/3677378

Danny Pereira, Sumana Ghosh, Soumyajit Dey

{"title":"边缘设备推理流水线的多流调度--一种 DRL 方法","authors":"Danny Pereira, Sumana Ghosh, Soumyajit Dey","doi":"10.1145/3677378","DOIUrl":null,"url":null,"abstract":"\n Low-power edge devices equipped with Graphics Processing Units (GPUs) are a popular target platform for real-time scheduling of inference pipelines. Such application-architecture combinations are popular in Advanced Driver-Assistance Systems (ADAS) for aiding in the real-time decision-making of automotive controllers. However, the real-time throughput sustainable by such inference pipelines is limited by resource constraints of the target edge devices. Modern GPUs, both in edge devices and workstation variants, support the facility of concurrent execution of computation kernels and data transfers using the primitive of\n streams\n , also allowing for the assignment of priority to these streams. This opens up the possibility of executing computation layers of inference pipelines within a multi-priority, multi-stream environment on the GPU. However, manually co-scheduling such applications while satisfying their throughput requirement and platform memory budget may require an unmanageable number of profiling runs. In this work, we propose a Deep Reinforcement Learning (DRL) based method for deciding the start time of various operations in each pipeline layer while optimizing the latency of execution of inference pipelines as well as memory consumption. Experimental results demonstrate the promising efficacy of the proposed DRL approach in comparison with the baseline methods, particularly in terms of real-time performance enhancements, schedulability ratio, and memory savings. We have additionally assessed the effectiveness of the proposed DRL approach using a real-time traffic simulation tool IPG CarMaker.\n","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Stream Scheduling of Inference Pipelines on Edge Devices - a DRL Approach\",\"authors\":\"Danny Pereira, Sumana Ghosh, Soumyajit Dey\",\"doi\":\"10.1145/3677378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Low-power edge devices equipped with Graphics Processing Units (GPUs) are a popular target platform for real-time scheduling of inference pipelines. Such application-architecture combinations are popular in Advanced Driver-Assistance Systems (ADAS) for aiding in the real-time decision-making of automotive controllers. However, the real-time throughput sustainable by such inference pipelines is limited by resource constraints of the target edge devices. Modern GPUs, both in edge devices and workstation variants, support the facility of concurrent execution of computation kernels and data transfers using the primitive of\\n streams\\n , also allowing for the assignment of priority to these streams. This opens up the possibility of executing computation layers of inference pipelines within a multi-priority, multi-stream environment on the GPU. However, manually co-scheduling such applications while satisfying their throughput requirement and platform memory budget may require an unmanageable number of profiling runs. In this work, we propose a Deep Reinforcement Learning (DRL) based method for deciding the start time of various operations in each pipeline layer while optimizing the latency of execution of inference pipelines as well as memory consumption. Experimental results demonstrate the promising efficacy of the proposed DRL approach in comparison with the baseline methods, particularly in terms of real-time performance enhancements, schedulability ratio, and memory savings. We have additionally assessed the effectiveness of the proposed DRL approach using a real-time traffic simulation tool IPG CarMaker.\\n\",\"PeriodicalId\":50944,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3677378\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3677378","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

配备图形处理器（GPU）的低功耗边缘设备是推理流水线实时调度的热门目标平台。这种应用架构组合在高级驾驶辅助系统（ADAS）中非常流行，可为汽车控制器的实时决策提供帮助。然而，由于目标边缘设备的资源限制，此类推理流水线可持续的实时吞吐量受到了限制。现代 GPU（包括边缘设备和工作站变体）支持计算内核和数据传输的并发执行，使用流原型，还允许为这些流分配优先级。这为在 GPU 的多优先级、多流环境中执行推理流水线的计算层提供了可能性。然而，在满足吞吐量要求和平台内存预算的同时，手动共同调度此类应用可能需要进行难以管理的剖析运行。在这项工作中，我们提出了一种基于深度强化学习（DRL）的方法，用于决定每个流水线层中各种操作的开始时间，同时优化推理流水线的执行延迟和内存消耗。实验结果表明，与基线方法相比，拟议的 DRL 方法具有良好的功效，尤其是在实时性能提升、可调度性比率和内存节省方面。此外，我们还使用实时交通仿真工具 IPG CarMaker 评估了所提出的 DRL 方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-Stream Scheduling of Inference Pipelines on Edge Devices - a DRL Approach

Low-power edge devices equipped with Graphics Processing Units (GPUs) are a popular target platform for real-time scheduling of inference pipelines. Such application-architecture combinations are popular in Advanced Driver-Assistance Systems (ADAS) for aiding in the real-time decision-making of automotive controllers. However, the real-time throughput sustainable by such inference pipelines is limited by resource constraints of the target edge devices. Modern GPUs, both in edge devices and workstation variants, support the facility of concurrent execution of computation kernels and data transfers using the primitive of streams , also allowing for the assignment of priority to these streams. This opens up the possibility of executing computation layers of inference pipelines within a multi-priority, multi-stream environment on the GPU. However, manually co-scheduling such applications while satisfying their throughput requirement and platform memory budget may require an unmanageable number of profiling runs. In this work, we propose a Deep Reinforcement Learning (DRL) based method for deciding the start time of various operations in each pipeline layer while optimizing the latency of execution of inference pipelines as well as memory consumption. Experimental results demonstrate the promising efficacy of the proposed DRL approach in comparison with the baseline methods, particularly in terms of real-time performance enhancements, schedulability ratio, and memory savings. We have additionally assessed the effectiveness of the proposed DRL approach using a real-time traffic simulation tool IPG CarMaker.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机：软件工程

CiteScore

3.20

自引率

7.10%

发文量

105

审稿时长

3 months

期刊介绍： TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.