gpu上内核的并发性与干扰分析

Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021) Pub Date : 2021-07-18 DOI:10.5753/CTD.2021.15757

Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes

{"title":"gpu上内核的并发性与干扰分析","authors":"Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes","doi":"10.5753/CTD.2021.15757","DOIUrl":null,"url":null,"abstract":"Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.","PeriodicalId":236085,"journal":{"name":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concurrency and Interference Analysis of Kernels on GPUs\",\"authors\":\"Pablo Carvalho, Lúcia M. A. Drummond, C. Bentes\",\"doi\":\"10.5753/CTD.2021.15757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.\",\"PeriodicalId\":236085,\"journal\":{\"name\":\"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/CTD.2021.15757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/CTD.2021.15757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

采用cpu和gpu的异构系统在大型数据中心和云环境中变得越来越流行。在这些平台中，跨不同应用程序共享GPU是提高硬件利用率和系统吞吐量的重要特性。然而，在gpu竞争性共享的情况下，出现了一些挑战。不同内核的同时执行是由硬件决定的，并且取决于内核的资源需求。此外，为了描述一种形式化的分配方法，很难理解同时执行决策中涉及的所有硬件变量。在这项工作中，我们研究了内核资源需求对并发执行的影响，并使用机器学习(ML)技术来推断并发执行引起的干扰，并对这种干扰导致的减速进行分类。在GPU基准套件、Rodinia、Parboil和SHOC上分析了ML技术。我们的结果表明，从分析中选择的特性来看，每个网格的块数、每个块的线程数和寄存器的数量是最影响并发执行性能的资源消耗特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Concurrency and Interference Analysis of Kernels on GPUs

Heterogeneous systems employing CPUs and GPUs are becoming increasingly popular in large-scale data centers and cloud environments. In these platforms, sharing a GPU across different applications is an important feature to improve hardware utilization and system throughput. However, under scenarios where GPUs are competitively shared, some challenges arise. The decision on the simultaneous execution of different kernels is made by the hardware and depends on the kernels resource requirements. Besides that, it is very difficult to understand all the hardware variables involved in the simultaneous execution decisions, in order to describe a formal allocation method. In this work, we studied the impact that kernel resource requirements have in concurrent execution and used machine learning (ML) techniques to infer the interference caused by the concurrent execution, and to classify the slowdown that results from this interference. The ML techniques were analyzed over the GPU benchmark suites, Rodinia, Parboil and SHOC. Our results showed that, from the features selected in the analysis, the number of blocks per grid, number of threads per block, and number of registers are the resource consumption features that most affect the performance of the concurrent execution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)

自引率

0.00%

发文量

期刊最新文献

Shared Memory Verification for Multicore Chip Designs Characterizing the Relationship Between Unitary Quantum Walks and Non-Homogeneous Random Walks Towards Automatic Fake News Detection in Digital Platforms: Properties, Limitations, and Applications Sunflower Theorems in Monotone Circuit Complexity On the Helly Property of Some Intersection Graphs