Reducing reorder buffer complexity through selective operand caching

Gürhan Küçük, D. Ponomarev, O. Ergin, K. Ghose
{"title":"Reducing reorder buffer complexity through selective operand caching","authors":"Gürhan Küçük, D. Ponomarev, O. Ergin, K. Ghose","doi":"10.1145/871506.871564","DOIUrl":null,"url":null,"abstract":"Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures, such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches (\"retention latches\"). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the Utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"117","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/871506.871564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 117

Abstract

Modern superscalar processors implement precise interrupts by using the Reorder Buffer (ROB). In some microarchitectures, such as the Intel P6, the ROB also serves as a repository for the uncommitted results. In these designs, the ROB is a complex multi-ported structure that dissipates a significant percentage of the overall chip power. Recently, a mechanism was introduced for reducing the ROB complexity and its power dissipation through the complete elimination of read ports for reading out source operands. The resulting performance degradation is countered by caching the most recently produced results in a small set of associatively-addressed latches ("retention latches"). We propose an enhancement to the above technique by leveraging the notion of short-lived operands (values targeting the registers that are renamed by the time the instruction producing the value reaches the writeback stage). As much as 87% of all generated values are short lived for the SPEC 2000 benchmarks. Significant improvements in the Utilization of retention latches, the overall performance, complexity and power are achieved by not caching short-lived values in the retention latches. As few as two retention latches allow all source operand read ports on the ROB to be completely eliminated with very little impact on performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过选择性操作数缓存减少重排序缓冲区的复杂性
现代超标量处理器通过使用重排序缓冲区(ROB)来实现精确的中断。在某些微体系结构中,例如Intel P6, ROB还用作未提交结果的存储库。在这些设计中,ROB是一个复杂的多端口结构,耗散了整个芯片功率的显着百分比。最近,引入了一种机制,通过完全消除读取源操作数的读端口来降低ROB的复杂性和功耗。通过将最近生成的结果缓存在一小组关联寻址的锁存器(“保留锁存器”)中,可以抵消由此导致的性能下降。我们建议通过利用短寿命操作数(在产生该值的指令到达回写阶段时重命名的寄存器的值)的概念来增强上述技术。在SPEC 2000基准测试中,多达87%的生成值是短暂存在的。通过在保留锁存器中不缓存短期值,在保留锁存器的利用率、总体性能、复杂性和功率方面实现了显著的改进。只需两个保留锁存,就可以完全消除ROB上的所有源操作数读端口,对性能影响很小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Voltage scheduling under unpredictabilities: a risk management paradigm [logic design] Uncertainty-based scheduling: energy-efficient ordering for tasks with variable execution time [processor scheduling] Level conversion for dual-supply systems [low power logic IC design] A selective filter-bank TLB system [embedded processor MMU for low power] A semi-custom voltage-island technique and its application to high-speed serial links [CMOS active power reduction]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1