gpu上内核的并发性与干扰分析

Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes
{"title":"gpu上内核的并发性与干扰分析","authors":"Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes","doi":"10.5753/CTD.2021.15757","DOIUrl":null,"url":null,"abstract":"Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.","PeriodicalId":236085,"journal":{"name":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concurrency and Interference Analysis of Kernels on GPUs\",\"authors\":\"Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes\",\"doi\":\"10.5753/CTD.2021.15757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.\",\"PeriodicalId\":236085,\"journal\":{\"name\":\"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/CTD.2021.15757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/CTD.2021.15757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

采用cpu和gpu的异构系统在大型数据中心和云环境中变得越来越流行。在这些平台中,跨不同应用程序共享GPU是提高硬件利用率和系统吞吐量的重要特性。然而,在gpu竞争性共享的情况下,出现了一些挑战。不同内核的同时执行是由硬件决定的,并且取决于内核的资源需求。此外,为了描述一种形式化的分配方法,很难理解同时执行决策中涉及的所有硬件变量。在这项工作中,我们研究了内核资源需求对并发执行的影响,并使用机器学习(ML)技术来推断并发执行引起的干扰,并对这种干扰导致的减速进行分类。在GPU基准套件、Rodinia、Parboil和SHOC上分析了ML技术。我们的结果表明,从分析中选择的特性来看,每个网格的块数、每个块的线程数和寄存器的数量是最影响并发执行性能的资源消耗特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Concurrency and Interference Analysis of Kernels on GPUs
Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Shared Memory Verification for Multicore Chip Designs Characterizing the Relationship Between Unitary Quantum Walks and Non-Homogeneous Random Walks Towards Automatic Fake News Detection in Digital Platforms: Properties, Limitations, and Applications Sunflower Theorems in Monotone Circuit Complexity On the Helly Property of Some Intersection Graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1