Power Constrained Autotuning using Graph Neural Networks

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-02-22 DOI:10.1109/IPDPS54959.2023.00060

Akashnil Dutta, JeeWhan Choi, A. Jannesari

{"title":"Power Constrained Autotuning using Graph Neural Networks","authors":"Akashnil Dutta, JeeWhan Choi, A. Jannesari","doi":"10.1109/IPDPS54959.2023.00060","DOIUrl":null,"url":null,"abstract":"Recent advances in multi and many-core processors have led to significant improvements in the performance of scientific computing applications. However, the addition of a large number of complex cores have also increased the overall power consumption, and power has become a first-order design constraint in modern processors. While we can limit power consumption by simply applying software-based power constraints, applying them blindly will lead to non-trivial performance degradation. To address the challenge of improving the performance, power, and energy efficiency of scientific applications on modern multi-core processors, we propose a novel Graph Neural Network based auto-tuning approach that (i) optimizes runtime performance at pre-defined power constraints, and (ii) simultaneously optimizes for runtime performance and energy efficiency by minimizing the energy-delay product. The key idea behind this approach lies in modeling parallel code regions as flow-aware code graphs to capture both semantic and structural code features. We demonstrate the efficacy of our approach by conducting an extensive evaluation on 30 benchmarks and proxy-/mini-applications with 68 OpenMP code regions. Our approach identifies OpenMP configurations at different power constraints that yield a geometric mean performance improvement of more than 25% and 13% over the default OpenMP configuration on a 32-core Skylake and a 16-core Haswell processor respectively. In addition, when we optimize for the energy-delay product, the OpenMP configurations selected by our auto-tuner demonstrate both performance improvement of 21% and 11% and energy reduction of 29% and 18% over the default OpenMP configuration at Thermal Design Power for the same Skylake and Haswell processors, respectively.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Recent advances in multi and many-core processors have led to significant improvements in the performance of scientific computing applications. However, the addition of a large number of complex cores have also increased the overall power consumption, and power has become a first-order design constraint in modern processors. While we can limit power consumption by simply applying software-based power constraints, applying them blindly will lead to non-trivial performance degradation. To address the challenge of improving the performance, power, and energy efficiency of scientific applications on modern multi-core processors, we propose a novel Graph Neural Network based auto-tuning approach that (i) optimizes runtime performance at pre-defined power constraints, and (ii) simultaneously optimizes for runtime performance and energy efficiency by minimizing the energy-delay product. The key idea behind this approach lies in modeling parallel code regions as flow-aware code graphs to capture both semantic and structural code features. We demonstrate the efficacy of our approach by conducting an extensive evaluation on 30 benchmarks and proxy-/mini-applications with 68 OpenMP code regions. Our approach identifies OpenMP configurations at different power constraints that yield a geometric mean performance improvement of more than 25% and 13% over the default OpenMP configuration on a 32-core Skylake and a 16-core Haswell processor respectively. In addition, when we optimize for the energy-delay product, the OpenMP configurations selected by our auto-tuner demonstrate both performance improvement of 21% and 11% and energy reduction of 29% and 18% over the default OpenMP configuration at Thermal Design Power for the same Skylake and Haswell processors, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用图神经网络的功率约束自动调谐

最近在多核和多核处理器方面的进展已经导致了科学计算应用性能的显著提高。然而，大量复杂内核的加入也增加了整体功耗，功耗已成为现代处理器的一级设计约束。虽然我们可以通过简单地应用基于软件的功率约束来限制功耗，但盲目地应用它们将导致严重的性能下降。为了解决在现代多核处理器上提高科学应用的性能、功耗和能源效率的挑战，我们提出了一种新的基于图神经网络的自动调谐方法，该方法(i)在预定义的功率约束下优化运行时性能，(ii)通过最小化能量延迟产品同时优化运行时性能和能源效率。这种方法背后的关键思想是将并行代码区域建模为流感知代码图，以捕获语义和结构代码特征。我们通过对具有68个OpenMP代码区域的30个基准和代理/迷你应用程序进行广泛评估来证明我们方法的有效性。我们的方法确定了在不同功率限制下的OpenMP配置，在32核Skylake和16核Haswell处理器上的默认OpenMP配置分别产生超过25%和13%的几何平均性能改进。此外，当我们对能量延迟产品进行优化时，我们的自动调谐器选择的OpenMP配置显示，在相同的Skylake和Haswell处理器的热设计功率下，与默认OpenMP配置相比，性能分别提高了21%和11%，能耗分别降低了29%和18%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量