GPUWattch: enabling energy optimizations in GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture Pub Date : 2013-06-23 DOI:10.1145/2485922.2485964

Jingwen Leng, Tayler H. Hetherington, Ahmed Eltantawy, S. Gilani, N. Kim, Tor M. Aamodt, V. Reddi

{"title":"GPUWattch: enabling energy optimizations in GPGPUs","authors":"Jingwen Leng, Tayler H. Hetherington, Ahmed Eltantawy, S. Gilani, N. Kim, Tor M. Aamodt, V. Reddi","doi":"10.1145/2485922.2485964","DOIUrl":null,"url":null,"abstract":"General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"551","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 551

Abstract

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

gpuwatch:使能gpu的能量优化

通用gpu (gpgpu)在主流计算中变得越来越普遍，每瓦特性能已经成为比峰值性能更重要的评估指标。因此，GPU架构师需要强大的工具，使他们能够快速探索优化gpgpu以提高能效的新方法。我们提出了一种新的GPGPU功耗模型，它是可配置的，能够进行周期级计算，并经过实际硬件测量的仔细验证。为了实现可配置性，我们使用自底向上的方法，并从微架构组件中抽象参数作为模型的输入。我们开发了一套严格的80个微基准，我们用它来约束任何建模的不确定性和不准确性。针对两款市售gpu的测量结果对功耗模型进行了全面验证，两款目标gpu (GTX 480和Quadro FX5600)的测量误差分别在9.9%和13.4%以内。该模型还准确地跟踪了一段时间内的电力消耗趋势。我们将功率模型与周期级模拟器GPGPU-Sim集成，并通过使用动态电压和频率缩放(DVFS)和时钟门控来演示节能。传统的DVFS通过利用内核内部运行时变化减少了14.4%的GPU能耗。对于那些显示集群执行行为的基准测试，更细粒度的SM集群级DVFS将能源节约从6.6%提高到13.6%。我们还表明，在发散期间时钟门控非活动通道可降低11.2%的动态功率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 40th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量