A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy

Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. Pub Date : 2004-06-19 DOI:10.1145/1028176.1006732

A. Parashar, S. Gurumurthi, A. Sivasubramaniam

{"title":"A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy","authors":"A. Parashar, S. Gurumurthi, A. Sivasubramaniam","doi":"10.1145/1028176.1006732","DOIUrl":null,"url":null,"abstract":"Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of up to 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.","PeriodicalId":268352,"journal":{"name":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1028176.1006732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of up to 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种复杂度有效的指令级时间冗余ALU带宽增强方法

先前在乱序核中实现指令级时间冗余的建议报告了在某些应用程序中与没有任何时间冗余的执行相比，性能下降高达45%。造成这个问题的一个重要原因是alu的数量不足以处理注入核心的放大负载。同时，增加alu的数量会增加问题逻辑的复杂性，这已经被指出是处理器中最关键的时序组件之一。本文提出了一种对先前指令重用思想的新扩展，通过利用双(临时冗余)指令流的某些有趣特性，以一种复杂性有效的方式缓解ALU带宽需求。我们提出了实现指令重用缓冲区(IRB)所需的微架构扩展，并将其与双指令流超标量核心的问题逻辑集成，并进行了广泛的评估，以证明它可以很好地缓解ALU带宽问题。我们表明，对于指令级临时冗余超标量执行，我们平均可以挽回近50%由于ALU带宽限制而导致的IPC损失，以及23%的总体IPC损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.

自引率

0.00%

发文量

期刊最新文献

A content aware integer register file organization The vector-thread architecture From sequences of dependent instructions to functions: an approach for improving performance without ILP or speculation Evaluating the Imagine stream architecture Wire delay is not a problem for SMT (in the near future)