A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy

A. Parashar, S. Gurumurthi, A. Sivasubramaniam
{"title":"A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy","authors":"A. Parashar, S. Gurumurthi, A. Sivasubramaniam","doi":"10.1145/1028176.1006732","DOIUrl":null,"url":null,"abstract":"Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of up to 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.","PeriodicalId":268352,"journal":{"name":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1028176.1006732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

Abstract

Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of up to 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种复杂度有效的指令级时间冗余ALU带宽增强方法
先前在乱序核中实现指令级时间冗余的建议报告了在某些应用程序中与没有任何时间冗余的执行相比,性能下降高达45%。造成这个问题的一个重要原因是alu的数量不足以处理注入核心的放大负载。同时,增加alu的数量会增加问题逻辑的复杂性,这已经被指出是处理器中最关键的时序组件之一。本文提出了一种对先前指令重用思想的新扩展,通过利用双(临时冗余)指令流的某些有趣特性,以一种复杂性有效的方式缓解ALU带宽需求。我们提出了实现指令重用缓冲区(IRB)所需的微架构扩展,并将其与双指令流超标量核心的问题逻辑集成,并进行了广泛的评估,以证明它可以很好地缓解ALU带宽问题。我们表明,对于指令级临时冗余超标量执行,我们平均可以挽回近50%由于ALU带宽限制而导致的IPC损失,以及23%的总体IPC损失。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A content aware integer register file organization The vector-thread architecture From sequences of dependent instructions to functions: an approach for improving performance without ILP or speculation Evaluating the Imagine stream architecture Wire delay is not a problem for SMT (in the near future)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1