Scalable power control for many-core architectures running multi-threaded applications

2011 38th Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2011-06-04 DOI:10.1145/2000064.2000117

Kai Ma, Xue Li, Ming Chen, Xiaorui Wang

{"title":"Scalable power control for many-core architectures running multi-threaded applications","authors":"Kai Ma, Xue Li, Ming Chen, Xiaorui Wang","doi":"10.1145/2000064.2000117","DOIUrl":null,"url":null,"abstract":"Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many-core architectures.","PeriodicalId":340732,"journal":{"name":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"155","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2000064.2000117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 155

Abstract

Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many-core architectures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为运行多线程应用程序的多核架构提供可扩展的功率控制

在功率预算内优化多核微处理器的性能最近受到了很多关注。然而，大多数现有的解决方案都是集中式的，不能很好地随核心集成水平的快速增长而扩展。虽然最近有一些研究提出了多核架构的功耗控制算法，但这些解决方案假设每个核心的工作负载是独立的，因此无法根据线程临界性有效地分配功率来加速多线程并行应用程序，而这些应用程序被认为是多核架构的主要工作负载。本文提出了一种针对多核微处理器的可扩展电源控制解决方案，该解决方案专门设计用于处理实际工作负载，即单线程和多线程应用程序的混合组。我们的解决方案采用三层设计。首先，我们采用控制理论，通过调整芯片上所有核心的聚合频率，将整个芯片的功耗精确控制到其芯片级预算。其次，我们动态分组运行相同应用程序的内核，然后在不同组之间划分芯片级聚合频率配额，以优化整体微处理器性能。最后，为了缩短应用程序完成时间，我们根据测量的线程临界性在每个组的内核之间划分组级频率配额。因此，我们的解决方案可以优化微处理器性能，同时精确地将芯片级功耗限制在所需的预算以下。在12核硬件测试平台上的经验结果表明，我们的控制解决方案可以提供精确的功率控制，并且在混合PARSEC和SPEC基准测试中，平均比两种最先进的解决方案提高17%和11%的应用性能。此外，我们对32核、64核和128核的广泛模拟结果，以及对高达4,096核的开销分析表明，我们的解决方案可高度扩展到多核架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 38th Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量