Managing GPU Concurrency in Heterogeneous Architectures

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture Pub Date : 2014-12-13 DOI:10.1109/MICRO.2014.62

Onur Kayiran, N. Nachiappan, Adwait Jog, Rachata Ausavarungnirun, M. Kandemir, G. Loh, O. Mutlu, C. Das

{"title":"Managing GPU Concurrency in Heterogeneous Architectures","authors":"Onur Kayiran, N. Nachiappan, Adwait Jog, Rachata Ausavarungnirun, M. Kandemir, G. Loh, O. Mutlu, C. Das","doi":"10.1109/MICRO.2014.62","DOIUrl":null,"url":null,"abstract":"Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We show that GPU applications tend to monopolize the shared hardware resources, such as memory and network, because of their high thread-level parallelism (TLP), and discuss the limitations of existing GPU-based concurrency management techniques when employed in heterogeneous systems. To solve this problem, we propose an integrated concurrency management strategy that modulates the TLP in GPUs to control the performance of both CPU and GPU applications. This mechanism considers both GPU core state and system-wide memory and network congestion information to dynamically decide on the level of GPU concurrency to maximize system performance. We propose and evaluate two schemes: one (CM-CPU) for boosting CPU performance in the presence of GPU interference, the other (CM-BAL) for improving both CPU and GPU performance in a balanced manner and thus overall system performance. Our evaluations show that the first scheme improves average CPU performance by 24%, while reducing average GPU performance by 11%. The second scheme provides 7% average performance improvement for both CPU and GPU applications. We also show that our solution allows the user to control performance trade-offs between CPUs and GPUs.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"74 1","pages":"114-126"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 130

Abstract

Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We show that GPU applications tend to monopolize the shared hardware resources, such as memory and network, because of their high thread-level parallelism (TLP), and discuss the limitations of existing GPU-based concurrency management techniques when employed in heterogeneous systems. To solve this problem, we propose an integrated concurrency management strategy that modulates the TLP in GPUs to control the performance of both CPU and GPU applications. This mechanism considers both GPU core state and system-wide memory and network congestion information to dynamically decide on the level of GPU concurrency to maximize system performance. We propose and evaluate two schemes: one (CM-CPU) for boosting CPU performance in the presence of GPU interference, the other (CM-BAL) for improving both CPU and GPU performance in a balanced manner and thus overall system performance. Our evaluations show that the first scheme improves average CPU performance by 24%, while reducing average GPU performance by 11%. The second scheme provides 7% average performance improvement for both CPU and GPU applications. We also show that our solution allows the user to control performance trade-offs between CPUs and GPUs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构架构下的GPU并发管理

由通用cpu和吞吐量优化gpu组成的异构架构预计将成为许多应用程序的主要计算平台。这种系统的设计比同构架构的设计更复杂，因为最大化资源利用率同时最小化CPU和GPU应用程序之间的共享资源干扰是困难的。我们展示了GPU应用程序由于其高线程级并行性(TLP)而倾向于垄断共享硬件资源，如内存和网络，并讨论了现有的基于GPU的并发管理技术在异构系统中使用时的局限性。为了解决这个问题，我们提出了一种集成的并发管理策略，该策略通过调节GPU中的TLP来控制CPU和GPU应用程序的性能。该机制考虑GPU核心状态和系统范围内的内存和网络拥塞信息来动态决定GPU并发级别，以最大化系统性能。我们提出并评估了两种方案:一种(CM-CPU)用于在存在GPU干扰的情况下提高CPU性能，另一种(CM-BAL)用于以平衡的方式提高CPU和GPU性能，从而提高整体系统性能。我们的评估表明，第一种方案提高了平均CPU性能24%，同时降低了平均GPU性能11%。第二种方案为CPU和GPU应用程序提供了7%的平均性能提升。我们还展示了我们的解决方案允许用户控制cpu和gpu之间的性能权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

自引率

0.00%

发文量