Reducing fetch architecture complexity using procedure inlining

Oliverio J. Santana, Alex Ramírez, M. Valero
{"title":"Reducing fetch architecture complexity using procedure inlining","authors":"Oliverio J. Santana, Alex Ramírez, M. Valero","doi":"10.1109/INTERA.2004.1299514","DOIUrl":null,"url":null,"abstract":"Fetch engine performance is seriously limited by the branch prediction table access latency. This fact has lead to the development of hardware mechanisms, like prediction overriding, aimed to tolerate this latency. However, prediction overriding requires additional support and recovery mechanisms, which increases the fetch architecture complexity. In this paper, we show that this increase in complexity can be avoided if the interaction between the fetch architecture and software code optimizations is taken into account. We use aggressive procedure inlining to generate long streams of instructions that are used by the fetch engine as the basic prediction unit. We call instruction stream to a sequence of instructions from the target of a taken branch to the next taken branch. These instruction streams are long enough to feed the execution engine with instructions during multiple cycles, while a new stream prediction is being generated, and thus hiding the prediction table access latency. Our results show that the length of instruction streams compensates the increase in the instruction cache miss rate caused by inlining. We show that, using procedure inlining, the need for a prediction overriding mechanism is avoided, reducing the fetch engine complexity.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTERA.2004.1299514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Fetch engine performance is seriously limited by the branch prediction table access latency. This fact has lead to the development of hardware mechanisms, like prediction overriding, aimed to tolerate this latency. However, prediction overriding requires additional support and recovery mechanisms, which increases the fetch architecture complexity. In this paper, we show that this increase in complexity can be avoided if the interaction between the fetch architecture and software code optimizations is taken into account. We use aggressive procedure inlining to generate long streams of instructions that are used by the fetch engine as the basic prediction unit. We call instruction stream to a sequence of instructions from the target of a taken branch to the next taken branch. These instruction streams are long enough to feed the execution engine with instructions during multiple cycles, while a new stream prediction is being generated, and thus hiding the prediction table access latency. Our results show that the length of instruction streams compensates the increase in the instruction cache miss rate caused by inlining. We show that, using procedure inlining, the need for a prediction overriding mechanism is avoided, reducing the fetch engine complexity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用过程内联降低获取体系结构的复杂性
分支预测表访问延迟严重限制了提取引擎的性能。这一事实导致了硬件机制的发展,比如预测覆盖,旨在容忍这种延迟。然而,预测覆盖需要额外的支持和恢复机制,这增加了获取架构的复杂性。在本文中,我们表明,如果考虑到获取架构和软件代码优化之间的交互,则可以避免这种复杂性的增加。我们使用主动过程内联来生成长指令流,这些指令流被获取引擎用作基本预测单元。我们将指令流称为指令序列,从一个已取分支的目标到下一个已取分支。这些指令流足够长,可以在多个周期内为执行引擎提供指令,同时生成新的流预测,从而隐藏预测表访问延迟。我们的研究结果表明,指令流的长度补偿了内联导致的指令缓存缺失率的增加。我们表明,使用过程内联,避免了对预测覆盖机制的需求,降低了获取引擎的复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Garbage collector refinement for new dynamic multimedia applications on embedded systems Fast indexing for blocked array layouts to improve multi-level cache locality Reducing fetch architecture complexity using procedure inlining SimSnap: fast-forwarding via native execution and application-level checkpointing Link-time optimization techniques for eliminating conditional branch redundancies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1