使用动态电压/频率和核心缩放提高功率受限gpu的吞吐量

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI:10.1109/PACT.2011.17

Jungseob Lee, V. Sathish, M. Schulte, Katherine Compton, N. Kim

{"title":"使用动态电压/频率和核心缩放提高功率受限gpu的吞吐量","authors":"Jungseob Lee, V. Sathish, M. Schulte, Katherine Compton, N. Kim","doi":"10.1109/PACT.2011.17","DOIUrl":null,"url":null,"abstract":"State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for some applications, it cannot for others because parallelism of applications and/or bandwidth of on-chip interconnects/caches and off-chip memory are limited. In this paper, first, we demonstrate that adjusting the number of operating cores and the voltage/frequency of cores and/or on-chip interconnects/caches for different applications can improve the throughput of GPUs under a power constraint. Second, we show that dynamically scaling the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches at runtime can improve the throughput of application even further. Our experimental results show that a GPU adopting our runtime dynamic voltage/frequency and core scaling technique can provide up to 38% (and nearly 20% on average) higher throughput than the baseline GPU under the same power constraint.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"73","resultStr":"{\"title\":\"Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling\",\"authors\":\"Jungseob Lee, V. Sathish, M. Schulte, Katherine Compton, N. Kim\",\"doi\":\"10.1109/PACT.2011.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for some applications, it cannot for others because parallelism of applications and/or bandwidth of on-chip interconnects/caches and off-chip memory are limited. In this paper, first, we demonstrate that adjusting the number of operating cores and the voltage/frequency of cores and/or on-chip interconnects/caches for different applications can improve the throughput of GPUs under a power constraint. Second, we show that dynamically scaling the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches at runtime can improve the throughput of application even further. Our experimental results show that a GPU adopting our runtime dynamic voltage/frequency and core scaling technique can provide up to 38% (and nearly 20% on average) higher throughput than the baseline GPU under the same power constraint.\",\"PeriodicalId\":106423,\"journal\":{\"name\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"73\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2011.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 73

摘要

最先进的图形处理单元(gpu)可以为使用数百个集成核心的高度并行应用程序提供非常高的计算吞吐量。一般来说，GPU的峰值吞吐量与内核数量及其频率的乘积成正比。然而，该产品经常受到功率限制。虽然对于某些应用程序来说，更多的核可以增加吞吐量，但对于其他应用程序来说却不能，因为应用程序的并行性和/或片上互连/缓存和片外内存的带宽是有限的。在本文中，首先，我们证明了调整工作核心的数量和核心的电压/频率和/或片上互连/缓存的不同应用可以提高gpu在功率限制下的吞吐量。其次，我们展示了在运行时动态缩放操作内核的数量以及内核和片上互连/缓存的电压/频率可以进一步提高应用程序的吞吐量。我们的实验结果表明，在相同功率约束下，采用我们的运行时动态电压/频率和核心缩放技术的GPU可以提供比基线GPU高38%(平均近20%)的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling

State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak throughput of a GPU is proportional to the product of the number of cores and their frequency. However, the product is often limited by a power constraint. Although the throughput can be increased with more cores for some applications, it cannot for others because parallelism of applications and/or bandwidth of on-chip interconnects/caches and off-chip memory are limited. In this paper, first, we demonstrate that adjusting the number of operating cores and the voltage/frequency of cores and/or on-chip interconnects/caches for different applications can improve the throughput of GPUs under a power constraint. Second, we show that dynamically scaling the number of operating cores and the voltages/frequencies of both cores and on-chip interconnects/caches at runtime can improve the throughput of application even further. Our experimental results show that a GPU adopting our runtime dynamic voltage/frequency and core scaling technique can provide up to 38% (and nearly 20% on average) higher throughput than the baseline GPU under the same power constraint.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量