多任务gpu中的集群感知调度

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Real-Time Systems Pub Date : 2023-11-22 DOI:10.1007/s11241-023-09409-x

Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang

{"title":"多任务gpu中的集群感知调度","authors":"Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang","doi":"10.1007/s11241-023-09409-x","DOIUrl":null,"url":null,"abstract":"<p>The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).\n</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"205 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cluster-aware scheduling in multitasking GPUs\",\"authors\":\"Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang\",\"doi\":\"10.1007/s11241-023-09409-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).\\n</p>\",\"PeriodicalId\":54507,\"journal\":{\"name\":\"Real-Time Systems\",\"volume\":\"205 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Real-Time Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11241-023-09409-x\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-023-09409-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

gpu中的流多处理器(SM)数量不断增加，以提供更高的计算能力。为了构建一个可扩展的横杆网络，将SMs连接到LLC片和内存控制器，在gpu中采用集群结构，其中一组SMs共享一个网口。不幸的是，当前的GPU空间多任务并没有意识到这种底层的片上网络基础设施，这给性能带来了挑战和机遇。在本文中，我们观察到与无集群的多任务处理相比，考虑到集群结构，集群内的SM分区以及共享网口的注入策略可以带来显著的性能提升。接下来，我们提出了一个低成本的在线分析和调度策略，该策略由两个步骤组成。集群感知调度首先确定集群内的最佳SM分区，然后在两个共同执行的应用程序之间找到适当的注入策略。这两个步骤都是在在线分析中实现的，这只会产生有限的运行时开销。评估结果表明，对于所有工作负载，我们的集群感知多任务平均使系统吞吐量提高12.9%(最高可达76.5%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cluster-aware scheduling in multitasking GPUs

The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Real-Time Systems 工程技术-计算机：理论方法

CiteScore

2.90

自引率

7.70%

发文量

审稿时长

6 months

期刊介绍： Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.