产生的

ACM Transactions on Computer Systems (TOCS) Pub Date : 2016-04-21 DOI:10.1145/2870632

Arthur Perais, André Seznec

{"title":"产生的","authors":"Arthur Perais, André Seznec","doi":"10.1145/2870632","DOIUrl":null,"url":null,"abstract":"Recent work in the field of value prediction (VP) has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, a simple recovery mechanism—pipeline squashing—can be used, whereas the out-of-order engine remains mostly unmodified. Yet, VP and validation at commit time require additional ports on the physical register file, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time. Consequently, a significant number of instructions—10% to 70% in our experiments—can bypass the out-of-order engine, allowing for a reduction of the issue width. This reduction paves the way for a truly practical implementation of VP. Furthermore, since VP in itself usually increases performance, our resulting {Early—Out-of-Order—Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"EOLE\",\"authors\":\"Arthur Perais, André Seznec\",\"doi\":\"10.1145/2870632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent work in the field of value prediction (VP) has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, a simple recovery mechanism—pipeline squashing—can be used, whereas the out-of-order engine remains mostly unmodified. Yet, VP and validation at commit time require additional ports on the physical register file, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time. Consequently, a significant number of instructions—10% to 70% in our experiments—can bypass the out-of-order engine, allowing for a reduction of the issue width. This reduction paves the way for a truly practical implementation of VP. Furthermore, since VP in itself usually increases performance, our resulting {Early—Out-of-Order—Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.\",\"PeriodicalId\":318554,\"journal\":{\"name\":\"ACM Transactions on Computer Systems (TOCS)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computer Systems (TOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2870632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2870632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

最近在价值预测(VP)领域的研究表明，给定一个有效的置信度估计机制，预测验证可以从无序引擎中移除，并延迟到提交时间。因此，可以使用一种简单的恢复机制——管道挤压，而无序的发动机基本上保持不变。然而，提交时的VP和验证需要物理寄存器文件上的额外端口，这可能导致端口总数无法承受。幸运的是，VP还意味着许多单周期ALU指令的操作数都在前端预测，并且可以按顺序就地执行。类似地，预测结果的单周期指令的执行可以延迟到提交时，因为在提交时验证了预测。因此，大量指令(在我们的实验中为10%到70%)可以绕过无序引擎，从而减少问题宽度。这种减少为VP的真正实际实现铺平了道路。此外，由于VP本身通常会提高性能，因此我们得到的{早乱序-晚乱序}执行架构EOLE通常比基线VP增强的6问题超标量更有效，同时具有明显更窄的4问题乱序引擎。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EOLE

Recent work in the field of value prediction (VP) has shown that given an efficient confidence estimation mechanism, prediction validation could be removed from the out-of-order engine and delayed until commit time. As a result, a simple recovery mechanism—pipeline squashing—can be used, whereas the out-of-order engine remains mostly unmodified. Yet, VP and validation at commit time require additional ports on the physical register file, potentially rendering the overall number of ports unbearable. Fortunately, VP also implies that many single-cycle ALU instructions have their operands predicted in the front-end and can be executed in-place, in-order. Similarly, the execution of single-cycle instructions whose result has been predicted can be delayed until commit time since predictions are validated at commit time. Consequently, a significant number of instructions—10% to 70% in our experiments—can bypass the out-of-order engine, allowing for a reduction of the issue width. This reduction paves the way for a truly practical implementation of VP. Furthermore, since VP in itself usually increases performance, our resulting {Early—Out-of-Order—Late} Execution architecture, EOLE, is often more efficient than a baseline VP-augmented 6-issue superscalar while having a significantly narrower 4-issue out-of-order engine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量