Verified instruction-level energy consumption measurement for NVIDIA GPUs

Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-02-18 DOI:10.1145/3387902.3392613

Yehia Arafa, Ammar Elwazir, Abdelrahman Elkanishy, Youssef Aly, Ayatelrahman Elsayed, Abdel-Hameed A. Badawy, Gopinath Chennupati, S. Eidenbenz, N. Santhi

{"title":"Verified instruction-level energy consumption measurement for NVIDIA GPUs","authors":"Yehia Arafa, Ammar Elwazir, Abdelrahman Elkanishy, Youssef Aly, Ayatelrahman Elsayed, Abdel-Hameed A. Badawy, Gopinath Chennupati, S. Eidenbenz, N. Santhi","doi":"10.1145/3387902.3392613","DOIUrl":null,"url":null,"abstract":"GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should aid in understanding NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387902.3392613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should aid in understanding NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

经过验证的NVIDIA gpu指令级能耗测量

gpu在各种规模的现代计算系统中都很普遍。它们消耗了这些系统中很大一部分能量。然而，供应商并没有公布其内部微架构的电力/能源开销的实际成本。在本文中，我们精确地测量了现代NVIDIA gpu中各种PTX指令的能耗。我们提供了来自四个不同代(Maxwell, Pascal, Volta和Turing)的四个高端NVIDIA gpu的40多个指令的详尽比较。此外，我们还展示了CUDA编译器优化对每个指令能耗的影响。我们使用三种不同的软件技术来读取GPU片上功率传感器，它们使用NVIDIA的NVML API，并提供这些技术之间的深入比较。此外，我们针对定制设计的硬件功率测量验证了软件测量技术。结果表明，对于不同类别的指令，Volta gpu在所有其他代中具有最佳的能效。这项工作应该有助于理解NVIDIA gpu的微架构。它还应该使任何GPU内核的能量测量既有效又准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 17th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量

期刊最新文献

A critical view on moving target defense and its analogies Deffe Management of container-based genetic algorithm workloads over cloud infrastructure Automaton-based methodology for implementing optimization constraints for quantum annealing An efficient object detection framework with modified dense connections for small objects optimizations