Reducing fetch architecture complexity using procedure inlining

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI:10.1109/INTERA.2004.1299514

Oliverio J. Santana, Alex Ramírez, M. Valero

{"title":"Reducing fetch architecture complexity using procedure inlining","authors":"Oliverio J. Santana, Alex Ramírez, M. Valero","doi":"10.1109/INTERA.2004.1299514","DOIUrl":null,"url":null,"abstract":"Fetch engine performance is seriously limited by the branch prediction table access latency. This fact has lead to the development of hardware mechanisms, like prediction overriding, aimed to tolerate this latency. However, prediction overriding requires additional support and recovery mechanisms, which increases the fetch architecture complexity. In this paper, we show that this increase in complexity can be avoided if the interaction between the fetch architecture and software code optimizations is taken into account. We use aggressive procedure inlining to generate long streams of instructions that are used by the fetch engine as the basic prediction unit. We call instruction stream to a sequence of instructions from the target of a taken branch to the next taken branch. These instruction streams are long enough to feed the execution engine with instructions during multiple cycles, while a new stream prediction is being generated, and thus hiding the prediction table access latency. Our results show that the length of instruction streams compensates the increase in the instruction cache miss rate caused by inlining. We show that, using procedure inlining, the need for a prediction overriding mechanism is avoided, reducing the fetch engine complexity.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTERA.2004.1299514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Fetch engine performance is seriously limited by the branch prediction table access latency. This fact has lead to the development of hardware mechanisms, like prediction overriding, aimed to tolerate this latency. However, prediction overriding requires additional support and recovery mechanisms, which increases the fetch architecture complexity. In this paper, we show that this increase in complexity can be avoided if the interaction between the fetch architecture and software code optimizations is taken into account. We use aggressive procedure inlining to generate long streams of instructions that are used by the fetch engine as the basic prediction unit. We call instruction stream to a sequence of instructions from the target of a taken branch to the next taken branch. These instruction streams are long enough to feed the execution engine with instructions during multiple cycles, while a new stream prediction is being generated, and thus hiding the prediction table access latency. Our results show that the length of instruction streams compensates the increase in the instruction cache miss rate caused by inlining. We show that, using procedure inlining, the need for a prediction overriding mechanism is avoided, reducing the fetch engine complexity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用过程内联降低获取体系结构的复杂性

分支预测表访问延迟严重限制了提取引擎的性能。这一事实导致了硬件机制的发展，比如预测覆盖，旨在容忍这种延迟。然而，预测覆盖需要额外的支持和恢复机制，这增加了获取架构的复杂性。在本文中，我们表明，如果考虑到获取架构和软件代码优化之间的交互，则可以避免这种复杂性的增加。我们使用主动过程内联来生成长指令流，这些指令流被获取引擎用作基本预测单元。我们将指令流称为指令序列，从一个已取分支的目标到下一个已取分支。这些指令流足够长，可以在多个周期内为执行引擎提供指令，同时生成新的流预测，从而隐藏预测表访问延迟。我们的研究结果表明，指令流的长度补偿了内联导致的指令缓存缺失率的增加。我们表明，使用过程内联，避免了对预测覆盖机制的需求，降低了获取引擎的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.

自引率

0.00%

发文量