Scalable thread scheduling and global power management for heterogeneous many-core architectures

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854283

Jonathan A. Winter, D. Albonesi, C. Shoemaker

{"title":"Scalable thread scheduling and global power management for heterogeneous many-core architectures","authors":"Jonathan A. Winter, D. Albonesi, C. Shoemaker","doi":"10.1145/1854273.1854283","DOIUrl":null,"url":null,"abstract":"Future many-core microprocessors are likely to be heterogeneous, by design or due to variability and defects. The latter type of heterogeneity is especially challenging due to its unpredictability. To minimize the performance and power impact of these hardware imperfections, the runtime thread scheduler and global power manager must be nimble enough to handle such random heterogeneity. With hundreds of cores expected on a single die in the future, these algorithms must provide high power-performance efficiency, yet remain scalable with low runtime overhead. This paper presents a range of scheduling and power management algorithms and performs a detailed evaluation of their effectiveness and scalability on heterogeneous many-core architectures with up to 256 cores. We also conduct a limit study on the potential benefits of coordinating scheduling and power management and demonstrate that coordination yields little benefit. We highlight the scalability limitations of previously proposed thread scheduling algorithms that were designed for small-scale chip multiprocessors and propose a Hierarchical Hungarian Scheduling Algorithm that dramatically reduces the scheduling overhead without loss of accuracy. Finally, we show that the high computational requirements of prior global power management algorithms based on linear programming make them infeasible for many-core chips, and that an algorithm that we call Steepest Drop achieves orders of magnitude lower execution time without sacrificing power-performance efficiency.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"278 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"159","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 159

Abstract

Future many-core microprocessors are likely to be heterogeneous, by design or due to variability and defects. The latter type of heterogeneity is especially challenging due to its unpredictability. To minimize the performance and power impact of these hardware imperfections, the runtime thread scheduler and global power manager must be nimble enough to handle such random heterogeneity. With hundreds of cores expected on a single die in the future, these algorithms must provide high power-performance efficiency, yet remain scalable with low runtime overhead. This paper presents a range of scheduling and power management algorithms and performs a detailed evaluation of their effectiveness and scalability on heterogeneous many-core architectures with up to 256 cores. We also conduct a limit study on the potential benefits of coordinating scheduling and power management and demonstrate that coordination yields little benefit. We highlight the scalability limitations of previously proposed thread scheduling algorithms that were designed for small-scale chip multiprocessors and propose a Hierarchical Hungarian Scheduling Algorithm that dramatically reduces the scheduling overhead without loss of accuracy. Finally, we show that the high computational requirements of prior global power management algorithms based on linear programming make them infeasible for many-core chips, and that an algorithm that we call Steepest Drop achieves orders of magnitude lower execution time without sacrificing power-performance efficiency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构多核架构的可伸缩线程调度和全局电源管理

未来的多核微处理器可能是异构的，这是由于设计或可变性和缺陷造成的。由于其不可预测性，后一种类型的异质性尤其具有挑战性。为了尽量减少这些硬件缺陷对性能和功耗的影响，运行时线程调度器和全局电源管理器必须足够灵活，能够处理这种随机的异构性。由于未来单个芯片上预计会有数百个内核，这些算法必须提供高功率性能效率，同时保持低运行时开销的可扩展性。本文介绍了一系列调度和电源管理算法，并对其在多达256核的异构多核架构上的有效性和可扩展性进行了详细评估。我们还对协调调度和电源管理的潜在好处进行了限制研究，并证明协调产生的好处很少。我们强调了先前提出的针对小型芯片多处理器的线程调度算法的可扩展性限制，并提出了一种分层匈牙利调度算法，该算法在不损失精度的情况下显着降低了调度开销。最后，我们证明了基于线性规划的先前全局电源管理算法的高计算需求使得它们对多核芯片不可行，并且我们称之为最陡下降的算法在不牺牲功率性能效率的情况下实现了数量级的低执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Reducing task creation and termination overhead in explicitly parallel programs An intra-tile cache set balancing scheme NUcache: A multicore cache organization based on Next-Use distance Towards a science of parallel programming Discovering and understanding performance bottlenecks in transactional applications