超标量体系结构的显式声明延迟分支机制

Microprocessing and Microprogramming Pub Date : 1994-12-01 Epub Date: 2003-07-30 DOI:10.1016/0165-6074(94)90016-7

Roger Collins, Gordon Steven

{"title":"超标量体系结构的显式声明延迟分支机制","authors":"Roger Collins, Gordon Steven","doi":"10.1016/0165-6074(94)90016-7","DOIUrl":null,"url":null,"abstract":"<div><p>One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"40 10","pages":"Pages 677-680"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)90016-7","citationCount":"3","resultStr":"{\"title\":\"An explicitly declared delayed-branch mechanism for a superscalar architecture\",\"authors\":\"Roger Collins, Gordon Steven\",\"doi\":\"10.1016/0165-6074(94)90016-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.</p></div>\",\"PeriodicalId\":100927,\"journal\":{\"name\":\"Microprocessing and Microprogramming\",\"volume\":\"40 10\",\"pages\":\"Pages 677-680\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/0165-6074(94)90016-7\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microprocessing and Microprogramming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/0165607494900167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2003/7/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessing and Microprogramming","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0165607494900167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2003/7/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

利用通用代码中可用的细粒度并行性的主要障碍之一是分支的频率，这些分支在运行时会导致程序的控制流发生不可预测的变化。无论何时执行分支，处理器都可能在等待从分支目标流中提取指令时导致性能损失。RISC处理器引入了一种延迟分支机制，该机制定义了可以调度代码的分支延迟时隙。这种策略允许处理器在控制流发生变化时忙于执行有用的指令。虽然延迟分支的概念可以很容易地扩展到VLIW体系结构，但它应该如何融入超标量体系结构还不太清楚。本文提出了一种通用的分支延迟机制，该机制适用于一系列代码兼容的超标量处理器，并且完全避免了在代码中引入NOP的需要。这项技术是作为HSP超标量项目的一个组成部分开发的。HSP是赫特福德大学目前正在研究的一种超标量体系结构，其目的是使用编译时指令调度，为一套非数字基准程序实现比传统RISC体系结构高一个数量级的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An explicitly declared delayed-branch mechanism for a superscalar architecture

One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Microprocessing and Microprogramming

自引率

0.00%

发文量