Exploring the Performance Limits of Out-of-order Commit

Proceedings of the Computing Frontiers Conference Pub Date : 2017-05-15 DOI:10.1145/3075564.3075581

M. Alipour, Trevor E. Carlson, S. Kaxiras

{"title":"Exploring the Performance Limits of Out-of-order Commit","authors":"M. Alipour, Trevor E. Carlson, S. Kaxiras","doi":"10.1145/3075564.3075581","DOIUrl":null,"url":null,"abstract":"Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Computing Frontiers Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3075564.3075581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Out-of-order execution is essential for high performance, general-purpose computation, as it can find and execute useful work instead of stalling. However, it is limited by the requirement of visibly sequential, atomic instruction execution --- in other words in-order instruction commit. While in-order commit has its advantages, such as providing precise interrupts and avoiding complications with the memory consistency model, it requires the core to hold on to resources (reorder buffer entries, load/store queue entries, registers) until they are released in program order. In contrast, out-of-order commit releases resources much earlier, yielding improved performance with fewer traditional hardware resources. However, out-of-order commit is limited in terms of correctness by the conditions described in the work of Bell and Lipasti. In this paper we revisit out-of-order commit from a different perspective, not by proposing another hardware technique, but by examining these conditions one by one and in combination with respect to their potential performance benefit for both non-speculative and speculative out-of-order commit. While correctly handling recovery for all out-of-order commit conditions currently requires complex tracking and expensive checkpointing, this work aims to demonstrate the potential for selective, speculative out-of-order commit using an oracle implementation without speculative rollback costs. We learn that: a) there is significant untapped potential for aggressive variants of out-of-order commit; b) it is important to optimize the commit depth, or the search distance for out-of-order commit, for a balanced design: smaller cores can benefit from shorter depths while larger cores continue to benefit from aggressive parameters; c) the focus on a subset of out-of-order commit conditions could lead to efficient implementations; d) the benefits for out-of-order commit increase with higher memory latency and works well in conjunction with prefetching to continue to improve performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索乱序提交的性能限制

乱序执行对于高性能、通用计算是必不可少的，因为它可以找到并执行有用的工作，而不是拖延。然而，它受到可见的顺序、原子指令执行需求的限制——换句话说，按顺序提交指令。虽然按顺序提交有其优点，例如提供精确的中断和避免内存一致性模型的复杂性，但它要求内核保持资源(重新排序缓冲区项、加载/存储队列项、寄存器)，直到它们按程序顺序释放。相反，乱序提交更早地释放资源，用更少的传统硬件资源获得更高的性能。然而，乱序提交在正确性方面受到Bell和Lipasti工作中描述的条件的限制。在本文中，我们从不同的角度重新审视乱序提交，不是通过提出另一种硬件技术，而是通过逐一检查这些条件，并结合它们在非推测性和推测性乱序提交方面的潜在性能优势。虽然正确处理所有乱序提交条件的恢复目前需要复杂的跟踪和昂贵的检查点，但这项工作旨在演示使用oracle实现选择性、推测性乱序提交的潜力，而不需要推测性回滚成本。我们了解到:a)无序提交的侵略性变体有很大的未开发潜力;B)对于平衡设计来说，优化提交深度或无序提交的搜索距离是很重要的:较小的岩心可以从较短的深度中受益，而较大的岩心继续从积极的参数中受益;C)关注乱序提交条件的子集可以导致高效的实现;D)乱序提交的好处随着内存延迟的增加而增加，并且可以很好地与预取结合使用以继续提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Computing Frontiers Conference

自引率

0.00%

发文量

期刊最新文献

Hardware Support for Secure Stream Processing in Cloud Environments Private inter-network routing for Wireless Sensor Networks and the Internet of Things Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture Design of S-boxes Defined with Cellular Automata Rules Cloud Workload Prediction by Means of Simulations