Dynamic performance tuning for speculative threads

Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai, Nikhil Mungre, Ankit Tarkas
{"title":"Dynamic performance tuning for speculative threads","authors":"Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai, Nikhil Mungre, Ankit Tarkas","doi":"10.1145/1555754.1555812","DOIUrl":null,"url":null,"abstract":"In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime.\n This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"8 1","pages":"462-473"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

In response to the emergence of multicore processors, various novel and sophisticated execution models have been introduced to fully utilize these processors. One such execution model is Thread-Level Speculation (TLS), which allows potentially dependent threads to execute speculatively in parallel. While TLS offers significant performance potential for applications that are otherwise non-parallel, extracting efficient speculative threads in the presence of complex control flow and ambiguous data dependences is a real challenge. This task is further complicated by the fact that the performance of speculative threads is often architecture-dependent, input-sensitive, and exhibits phase behaviors. Thus we propose dynamic performance tuning mechanisms that determine where and how to create speculative threads at runtime. This paper describes the design, implementation, and evaluation of hardware and software support that takes advantage of runtime performance profiles to extract efficient speculative threads. In our proposed framework, speculative threads are monitored by hardware-based performance counters and their performance impact is estimated. The creation of speculative threads is adjusted based on the estimation. This paper proposes speculative threads performance estimation techniques, that are capable of correctly determining whether speculation can improve performance for loops that corresponds to 83.8% of total loop execution time across all benchmarks. This paper also examines several dynamic performance tuning policies and finds that the best tuning policy achieves an overall speedup of 36.8%on a set of benchmarks from SPEC2000 suite, which outperforms static thread management by 9.5%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
推测线程的动态性能调优
为了应对多核处理器的出现,已经引入了各种新颖和复杂的执行模型来充分利用这些处理器。其中一种执行模型是线程级推测(TLS),它允许潜在的依赖线程并行地推测执行。虽然TLS为非并行的应用程序提供了巨大的性能潜力,但在复杂的控制流和模糊的数据依赖关系中提取有效的推测线程是一个真正的挑战。由于推测线程的性能通常依赖于体系结构,对输入敏感,并表现出阶段行为,因此使这项任务更加复杂。因此,我们提出了动态性能调优机制,以确定在何处以及如何在运行时创建推测线程。本文描述了硬件和软件支持的设计、实现和评估,这些支持利用运行时性能配置文件来提取有效的推测线程。在我们提出的框架中,推测线程由基于硬件的性能计数器监视,并估计其性能影响。投机线程的创建将根据估计进行调整。本文提出了推测线程性能估计技术,能够正确地确定推测是否可以提高循环的性能,在所有基准测试中,循环执行时间占总循环执行时间的83.8%。本文还研究了几种动态性能调优策略,并发现在SPEC2000套件的一组基准测试中,最佳调优策略可以实现36.8%的总体加速,比静态线程管理高出9.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18 - 22, 2022 Special-purpose and future architectures Computer memory systems Basics of the central processing unit FRONT MATTER
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1