{"title":"超标量体系结构的显式声明延迟分支机制","authors":"Roger Collins, Gordon Steven","doi":"10.1016/0165-6074(94)90016-7","DOIUrl":null,"url":null,"abstract":"<div><p>One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"40 10","pages":"Pages 677-680"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)90016-7","citationCount":"3","resultStr":"{\"title\":\"An explicitly declared delayed-branch mechanism for a superscalar architecture\",\"authors\":\"Roger Collins, Gordon Steven\",\"doi\":\"10.1016/0165-6074(94)90016-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.</p></div>\",\"PeriodicalId\":100927,\"journal\":{\"name\":\"Microprocessing and Microprogramming\",\"volume\":\"40 10\",\"pages\":\"Pages 677-680\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/0165-6074(94)90016-7\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microprocessing and Microprogramming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/0165607494900167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessing and Microprogramming","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0165607494900167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An explicitly declared delayed-branch mechanism for a superscalar architecture
One of the main obstacles to exploiting the fine-grained parallelism that is available in general-purpose code is the frequency of branches that cause unpredictable changes in the control flow of a program at run-time. Whenever a branch is taken, a performance penalty may be incurred as the processor waits for instructions to be fetched from the branch target stream. RISC processors introduce a delayed-branch mechanism which defines branch delay slots into which code can be scheduled. This strategy allows the processor to be kept busy executing useful instructions while the change of control flow takes place. While the concept of delayed branches can be readily extended to VLIW architectures, it is less clear how it should be incorporated in a superscalar architecture. This paper proposes a general branch-delay mechanism which is suitable for a range of code-compatible superscalar processors and which completely avoids the need to introduce NOPs into the code. This technique was developed as an integral part of the HSP superscalar project. HSP is a superscalar architecture currently being researched at the University of Hertfordshire with the aim of using compile-time instruction scheduling to achieve an order of magnitude speed-up over traditional RISC architectures for a suite of non-numeric benchmark programs.