Practical data value speculation for future high-end processors

Arthur Perais, André Seznec
{"title":"Practical data value speculation for future high-end processors","authors":"Arthur Perais, André Seznec","doi":"10.1109/HPCA.2014.6835952","DOIUrl":null,"url":null,"abstract":"Dedicating more silicon area to single thread performance will necessarily be considered as worthwhile in future - potentially heterogeneous - multicores. In particular, Value prediction (VP) was proposed in the mid 90's to enhance the performance of high-end uniprocessors by breaking true data dependencies. In this paper, we reconsider the concept of Value Prediction in the contemporary context and show its potential as a direction to improve current single thread performance. First, building on top of research carried out during the previous decade on confidence estimation, we show that every value predictor is amenable to very high prediction accuracy using very simple hardware. This clears the path to an implementation of VP without a complex selective reissue mechanism to absorb mispredictions. Prediction is performed in the in-order pipeline frond-end and validation is performed in the in-order pipeline back-end, while the out-of-order engine is only marginally modified. Second, when predicting back-to-back occurrences of the same instruction, previous context-based value predictors relying on local value history exhibit a complex critical loop that should ideally be implemented in a single cycle. To bypass this requirement, we introduce a new value predictor VTAGE harnessing the global branch history. VTAGE can seamlessly predict back-to-back occurrences, allowing predictions to span over several cycles. It achieves higher performance than previously proposed context-based predictors. Specifically, using SPEC'00 and SPEC'06 benchmarks, our simulations show that combining VTAGE and a stride based predictor yields up to 65% speedup on a fairly aggressive pipeline without support for selective reissue.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 68

Abstract

Dedicating more silicon area to single thread performance will necessarily be considered as worthwhile in future - potentially heterogeneous - multicores. In particular, Value prediction (VP) was proposed in the mid 90's to enhance the performance of high-end uniprocessors by breaking true data dependencies. In this paper, we reconsider the concept of Value Prediction in the contemporary context and show its potential as a direction to improve current single thread performance. First, building on top of research carried out during the previous decade on confidence estimation, we show that every value predictor is amenable to very high prediction accuracy using very simple hardware. This clears the path to an implementation of VP without a complex selective reissue mechanism to absorb mispredictions. Prediction is performed in the in-order pipeline frond-end and validation is performed in the in-order pipeline back-end, while the out-of-order engine is only marginally modified. Second, when predicting back-to-back occurrences of the same instruction, previous context-based value predictors relying on local value history exhibit a complex critical loop that should ideally be implemented in a single cycle. To bypass this requirement, we introduce a new value predictor VTAGE harnessing the global branch history. VTAGE can seamlessly predict back-to-back occurrences, allowing predictions to span over several cycles. It achieves higher performance than previously proposed context-based predictors. Specifically, using SPEC'00 and SPEC'06 benchmarks, our simulations show that combining VTAGE and a stride based predictor yields up to 65% speedup on a fairly aggressive pipeline without support for selective reissue.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
未来高端处理器的实用数据价值推测
将更多的硅片面积用于单线程性能,在未来(可能是异构的)多核中是值得考虑的。特别是,值预测(VP)是在90年代中期提出的,通过打破真正的数据依赖关系来增强高端单处理器的性能。在本文中,我们在当代背景下重新考虑了值预测的概念,并展示了其作为改进当前单线程性能方向的潜力。首先,在过去十年中对置信度估计进行的研究的基础上,我们表明每个值预测器都可以使用非常简单的硬件实现非常高的预测精度。这为VP的实现扫清了道路,而不需要复杂的选择性重新发布机制来吸收错误的预测。预测在有序管道前端执行,验证在有序管道后端执行,而乱序引擎只被略微修改。其次,当预测同一指令的连续出现时,先前基于上下文的值预测器依赖于本地值历史,表现出复杂的关键循环,理想情况下应该在单个循环中实现。为了绕过这个需求,我们引入一个利用全局分支历史的新的值预测器VTAGE。VTAGE可以无缝地预测连续发生的事件,允许预测跨越几个周期。它比以前提出的基于上下文的预测器实现了更高的性能。具体来说,使用SPEC'00和SPEC'06基准测试,我们的模拟表明,在不支持选择性重新发行的情况下,在相当积极的管道上,结合VTAGE和基于跨步的预测器可以产生高达65%的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Precision-aware soft error protection for GPUs Low-overhead and high coverage run-time race detection through selective meta-data management Improving DRAM performance by parallelizing refreshes with accesses Improving GPGPU resource utilization through alternative thread block scheduling DraMon: Predicting memory bandwidth usage of multi-threaded programs with high accuracy and low overhead
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1