Load Balancing in Decoupled Look-ahead: A Do-It-Yourself (DIY) Approach

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.55

Raj Parihar, Michael C. Huang

{"title":"Load Balancing in Decoupled Look-ahead: A Do-It-Yourself (DIY) Approach","authors":"Raj Parihar, Michael C. Huang","doi":"10.1109/PACT.2015.55","DOIUrl":null,"url":null,"abstract":"Despite the proliferation of multi-core and multi-threaded architectures, exploiting implicit parallelism for a single semantic thread is still a crucial component in achieving high performance. Lookahead is a \"tried-and-true\" strategy in uncovering implicit parallelism. However, a conventional, monolithic out-of-order core quickly becomes resource-inefficient when looking beyond a small distance. One general approach to mitigate the impact of branch mispredictions and cache misses is to enable deep look-ahead. A particular approach that is both flexible and effective is to use an independent, decoupled look-ahead thread on a separate thread context guided by a program slice known as skeleton. While capable of generating significant performance gains, the look-ahead agent often becomes the new speed limit. We propose to accelerate the look-ahead thread by skipping branch based, side-effect free code modules that do not contribute to the effectiveness of look-ahead. We call them Do-It-Yourself or DIY branches for which the main thread does not get any help from the look-ahead thread, instead relies on its own branch predictor and prefetcher. By skipping DIY branches, look-ahead thread propels ahead and provides performance-critical assistance down the stream to improve the performance of decoupled look-ahead system by up to 15%.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the proliferation of multi-core and multi-threaded architectures, exploiting implicit parallelism for a single semantic thread is still a crucial component in achieving high performance. Lookahead is a "tried-and-true" strategy in uncovering implicit parallelism. However, a conventional, monolithic out-of-order core quickly becomes resource-inefficient when looking beyond a small distance. One general approach to mitigate the impact of branch mispredictions and cache misses is to enable deep look-ahead. A particular approach that is both flexible and effective is to use an independent, decoupled look-ahead thread on a separate thread context guided by a program slice known as skeleton. While capable of generating significant performance gains, the look-ahead agent often becomes the new speed limit. We propose to accelerate the look-ahead thread by skipping branch based, side-effect free code modules that do not contribute to the effectiveness of look-ahead. We call them Do-It-Yourself or DIY branches for which the main thread does not get any help from the look-ahead thread, instead relies on its own branch predictor and prefetcher. By skipping DIY branches, look-ahead thread propels ahead and provides performance-critical assistance down the stream to improve the performance of decoupled look-ahead system by up to 15%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解耦前瞻性中的负载平衡:一种自己动手的方法

尽管多核和多线程体系结构在不断发展，但为单个语义线程开发隐式并行性仍然是实现高性能的关键组成部分。在揭示隐式并行性方面，向前看是一种“久经考验”的策略。然而，传统的、单片无序的核心在观察一小段距离时很快就会变得资源效率低下。减轻分支错误预测和缓存丢失影响的一种通用方法是启用深度前瞻性。一种既灵活又有效的特殊方法是在一个单独的线程上下文中使用一个独立的、解耦的前瞻性线程，该线程由一个称为骨架的程序片引导。虽然能够产生显著的性能提升，但向前看代理经常成为新的速度限制。我们建议通过跳过基于分支的、没有副作用的代码模块来加速预检线程，这些模块对预检的有效性没有贡献。我们称它们为“自己动手”或“自己动手”分支，对于这些分支，主线程不从预检线程那里得到任何帮助，而是依赖于它自己的分支预测器和预取器。通过跳过DIY分支，预检线程向前推进，并提供对性能至关重要的帮助，从而将解耦预检系统的性能提高了15%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量

期刊最新文献

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain? AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Scalable Task Scheduling and Synchronization Using Hierarchical Effects Scalable SIMD-Efficient Graph Processing on GPUs