Scalable power control for many-core architectures running multi-threaded applications

Kai Ma, Xue Li, Ming Chen, Xiaorui Wang
{"title":"Scalable power control for many-core architectures running multi-threaded applications","authors":"Kai Ma, Xue Li, Ming Chen, Xiaorui Wang","doi":"10.1145/2000064.2000117","DOIUrl":null,"url":null,"abstract":"Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many-core architectures.","PeriodicalId":340732,"journal":{"name":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"155","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2000064.2000117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 155

Abstract

Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many-core architectures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为运行多线程应用程序的多核架构提供可扩展的功率控制
在功率预算内优化多核微处理器的性能最近受到了很多关注。然而,大多数现有的解决方案都是集中式的,不能很好地随核心集成水平的快速增长而扩展。虽然最近有一些研究提出了多核架构的功耗控制算法,但这些解决方案假设每个核心的工作负载是独立的,因此无法根据线程临界性有效地分配功率来加速多线程并行应用程序,而这些应用程序被认为是多核架构的主要工作负载。本文提出了一种针对多核微处理器的可扩展电源控制解决方案,该解决方案专门设计用于处理实际工作负载,即单线程和多线程应用程序的混合组。我们的解决方案采用三层设计。首先,我们采用控制理论,通过调整芯片上所有核心的聚合频率,将整个芯片的功耗精确控制到其芯片级预算。其次,我们动态分组运行相同应用程序的内核,然后在不同组之间划分芯片级聚合频率配额,以优化整体微处理器性能。最后,为了缩短应用程序完成时间,我们根据测量的线程临界性在每个组的内核之间划分组级频率配额。因此,我们的解决方案可以优化微处理器性能,同时精确地将芯片级功耗限制在所需的预算以下。在12核硬件测试平台上的经验结果表明,我们的控制解决方案可以提供精确的功率控制,并且在混合PARSEC和SPEC基准测试中,平均比两种最先进的解决方案提高17%和11%的应用性能。此外,我们对32核、64核和128核的广泛模拟结果,以及对高达4,096核的开销分析表明,我们的解决方案可高度扩展到多核架构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Crafting a usable microkernel, processor, and I/O system with strict and provable information flow security Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators Scalable power control for many-core architectures running multi-threaded applications Virtualizing performance asymmetric multi-core systems DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1