Dynamic performance tuning for speculative threads

Proceedings. International Symposium on Computer Architecture Pub Date : 2009-06-15 DOI:10.1145/1555754.1555812

Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai, Nikhil Mungre, Ankit Tarkas

{"title":"Dynamic performance tuning for speculative threads","authors":"Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai, Nikhil Mungre, Ankit Tarkas","doi":"10.1145/1555754.1555812","DOIUrl":null,"url":null,"abstract":"In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime.\n This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"8 1","pages":"462-473"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime. This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

推测线程的动态性能调优

为了应对多核处理器的出现，已经引入了各种新颖和复杂的执行模型来充分利用这些处理器。其中一种执行模型是线程级推测(TLS)，它允许潜在的依赖线程并行地推测执行。虽然TLS为非并行的应用程序提供了巨大的性能潜力，但在复杂的控制流和模糊的数据依赖关系中提取有效的推测线程是一个真正的挑战。由于推测线程的性能通常依赖于体系结构，对输入敏感，并表现出阶段行为，因此使这项任务更加复杂。因此，我们提出了动态性能调优机制，以确定在何处以及如何在运行时创建推测线程。本文描述了硬件和软件支持的设计、实现和评估，这些支持利用运行时性能配置文件来提取有效的推测线程。在我们提出的框架中，推测线程由基于硬件的性能计数器监视，并估计其性能影响。投机线程的创建将根据估计进行调整。本文提出了推测线程性能估计技术，能够正确地确定推测是否可以提高循环的性能，在所有基准测试中，循环执行时间占总循环执行时间的83.8%。本文还研究了几种动态性能调优策略，并发现在SPEC2000套件的一组基准测试中，最佳调优策略可以实现36.8%的总体加速，比静态线程管理高出9.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. International Symposium on Computer Architecture

自引率

0.00%

发文量

期刊最新文献

ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18 - 22, 2022 Special-purpose and future architectures Computer memory systems Basics of the central processing unit FRONT MATTER