多任务gpu中的集群感知调度

IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Real-Time Systems Pub Date : 2023-11-22 DOI:10.1007/s11241-023-09409-x
Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang
{"title":"多任务gpu中的集群感知调度","authors":"Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang","doi":"10.1007/s11241-023-09409-x","DOIUrl":null,"url":null,"abstract":"<p>The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).\n</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cluster-aware scheduling in multitasking GPUs\",\"authors\":\"Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang\",\"doi\":\"10.1007/s11241-023-09409-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).\\n</p>\",\"PeriodicalId\":54507,\"journal\":{\"name\":\"Real-Time Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Real-Time Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11241-023-09409-x\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Real-Time Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11241-023-09409-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

gpu中的流多处理器(SM)数量不断增加,以提供更高的计算能力。为了构建一个可扩展的横杆网络,将SMs连接到LLC片和内存控制器,在gpu中采用集群结构,其中一组SMs共享一个网口。不幸的是,当前的GPU空间多任务并没有意识到这种底层的片上网络基础设施,这给性能带来了挑战和机遇。在本文中,我们观察到与无集群的多任务处理相比,考虑到集群结构,集群内的SM分区以及共享网口的注入策略可以带来显著的性能提升。接下来,我们提出了一个低成本的在线分析和调度策略,该策略由两个步骤组成。集群感知调度首先确定集群内的最佳SM分区,然后在两个共同执行的应用程序之间找到适当的注入策略。这两个步骤都是在在线分析中实现的,这只会产生有限的运行时开销。评估结果表明,对于所有工作负载,我们的集群感知多任务平均使系统吞吐量提高12.9%(最高可达76.5%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cluster-aware scheduling in multitasking GPUs

The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Real-Time Systems
Real-Time Systems 工程技术-计算机:理论方法
CiteScore
2.90
自引率
7.70%
发文量
15
审稿时长
6 months
期刊介绍: Papers published in Real-Time Systems cover, among others, the following topics: requirements engineering, specification and verification techniques, design methods and tools, programming languages, operating systems, scheduling algorithms, architecture, hardware and interfacing, dependability and safety, distributed and other novel architectures, wired and wireless communications, wireless sensor systems, distributed databases, artificial intelligence techniques, expert systems, and application case studies. Applications are found in command and control systems, process control, automated manufacturing, flight control, avionics, space avionics and defense systems, shipborne systems, vision and robotics, pervasive and ubiquitous computing, and in an abundance of embedded systems.
期刊最新文献
Multi-core interference over-estimation reduction by static scheduling of multi-phase tasks Connecting the physical space and cyber space of autonomous systems more closely Mcti: mixed-criticality task-based isolation Minimizing cache usage with fixed-priority and earliest deadline first scheduling MemPol: polling-based microsecond-scale per-core memory bandwidth regulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1