Practical data value speculation for future high-end processors

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2014-06-19 DOI:10.1109/HPCA.2014.6835952

Arthur Perais, André Seznec

{"title":"Practical data value speculation for future high-end processors","authors":"Arthur Perais, André Seznec","doi":"10.1109/HPCA.2014.6835952","DOIUrl":null,"url":null,"abstract":"Dedicating more silicon area to single thread performance will necessarily be considered as worthwhile in future - potentially heterogeneous - multicores. In particular, Value prediction (VP) was proposed in the mid 90's to enhance the performance of high-end uniprocessors by breaking true data dependencies. In this paper, we reconsider the concept of Value Prediction in the contemporary context and show its potential as a direction to improve current single thread performance. First, building on top of research carried out during the previous decade on confidence estimation, we show that every value predictor is amenable to very high prediction accuracy using very simple hardware. This clears the path to an implementation of VP without a complex selective reissue mechanism to absorb mispredictions. Prediction is performed in the in-order pipeline frond-end and validation is performed in the in-order pipeline back-end, while the out-of-order engine is only marginally modified. Second, when predicting back-to-back occurrences of the same instruction, previous context-based value predictors relying on local value history exhibit a complex critical loop that should ideally be implemented in a single cycle. To bypass this requirement, we introduce a new value predictor VTAGE harnessing the global branch history. VTAGE can seamlessly predict back-to-back occurrences, allowing predictions to span over several cycles. It achieves higher performance than previously proposed context-based predictors. Specifically, using SPEC'00 and SPEC'06 benchmarks, our simulations show that combining VTAGE and a stride based predictor yields up to 65% speedup on a fairly aggressive pipeline without support for selective reissue.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Dedicating more silicon area to single thread performance will necessarily be considered as worthwhile in future - potentially heterogeneous - multicores. In particular, Value prediction (VP) was proposed in the mid 90's to enhance the performance of high-end uniprocessors by breaking true data dependencies. In this paper, we reconsider the concept of Value Prediction in the contemporary context and show its potential as a direction to improve current single thread performance. First, building on top of research carried out during the previous decade on confidence estimation, we show that every value predictor is amenable to very high prediction accuracy using very simple hardware. This clears the path to an implementation of VP without a complex selective reissue mechanism to absorb mispredictions. Prediction is performed in the in-order pipeline frond-end and validation is performed in the in-order pipeline back-end, while the out-of-order engine is only marginally modified. Second, when predicting back-to-back occurrences of the same instruction, previous context-based value predictors relying on local value history exhibit a complex critical loop that should ideally be implemented in a single cycle. To bypass this requirement, we introduce a new value predictor VTAGE harnessing the global branch history. VTAGE can seamlessly predict back-to-back occurrences, allowing predictions to span over several cycles. It achieves higher performance than previously proposed context-based predictors. Specifically, using SPEC'00 and SPEC'06 benchmarks, our simulations show that combining VTAGE and a stride based predictor yields up to 65% speedup on a fairly aggressive pipeline without support for selective reissue.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

未来高端处理器的实用数据价值推测

将更多的硅片面积用于单线程性能，在未来(可能是异构的)多核中是值得考虑的。特别是，值预测(VP)是在90年代中期提出的，通过打破真正的数据依赖关系来增强高端单处理器的性能。在本文中，我们在当代背景下重新考虑了值预测的概念，并展示了其作为改进当前单线程性能方向的潜力。首先，在过去十年中对置信度估计进行的研究的基础上，我们表明每个值预测器都可以使用非常简单的硬件实现非常高的预测精度。这为VP的实现扫清了道路，而不需要复杂的选择性重新发布机制来吸收错误的预测。预测在有序管道前端执行，验证在有序管道后端执行，而乱序引擎只被略微修改。其次，当预测同一指令的连续出现时，先前基于上下文的值预测器依赖于本地值历史，表现出复杂的关键循环，理想情况下应该在单个循环中实现。为了绕过这个需求，我们引入一个利用全局分支历史的新的值预测器VTAGE。VTAGE可以无缝地预测连续发生的事件，允许预测跨越几个周期。它比以前提出的基于上下文的预测器实现了更高的性能。具体来说，使用SPEC'00和SPEC'06基准测试，我们的模拟表明，在不支持选择性重新发行的情况下，在相当积极的管道上，结合VTAGE和基于跨步的预测器可以产生高达65%的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量