大问题微体系结构的两条缓存线预测

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI:10.1109/ACAC.2001.903361

Shu-Lin Hwang, F. Lai

{"title":"大问题微体系结构的两条缓存线预测","authors":"Shu-Lin Hwang, F. Lai","doi":"10.1109/ACAC.2001.903361","DOIUrl":null,"url":null,"abstract":"Modern micro-architectures employ superscalar techniques to enhance system performance. The superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. In this paper, we propose the Grouped Branch Prediction (GBP) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the GBP with different group sizes are simulated. The simulation results show that the branch penalty of the group size 4 with 2048-entry is under 0.65 clock cycle. In our design, we choose the two-group scheme with group size 4. This feature achieves an average of 4.9 IPC f (the number of instructions fetched per cycle for a machine front-end). Furthermore, we extend the GBP to achieve two cache lines predictions with two fetch units. The scheme of the 2048-entry 2-group with group size 4 can produce an average of 8.4 IPC f. The performance is approximately 66.5% better than the original 2-group GBPs. The added hardware cost (41.5 k bits) is less than 40%.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Two cache lines prediction for a wide-issue micro-architecture\",\"authors\":\"Shu-Lin Hwang, F. Lai\",\"doi\":\"10.1109/ACAC.2001.903361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern micro-architectures employ superscalar techniques to enhance system performance. The superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. In this paper, we propose the Grouped Branch Prediction (GBP) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the GBP with different group sizes are simulated. The simulation results show that the branch penalty of the group size 4 with 2048-entry is under 0.65 clock cycle. In our design, we choose the two-group scheme with group size 4. This feature achieves an average of 4.9 IPC f (the number of instructions fetched per cycle for a machine front-end). Furthermore, we extend the GBP to achieve two cache lines predictions with two fetch units. The scheme of the 2048-entry 2-group with group size 4 can produce an average of 8.4 IPC f. The performance is approximately 66.5% better than the original 2-group GBPs. The added hardware cost (41.5 k bits) is less than 40%.\",\"PeriodicalId\":230403,\"journal\":{\"name\":\"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAC.2001.903361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAC.2001.903361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

现代微架构采用超标量技术来提高系统性能。超标量微处理器必须一次至少获取一条指令缓存行，以支持高发放率和大量推测执行。在本文中，我们提出了分组分支预测(GBP)，它可以识别和预测同一指令缓存线中的多个分支。模拟了几种不同群大小的GBP结构。仿真结果表明，当分组规模为4且分组个数为2048时，分支惩罚小于0.65时钟周期。在我们的设计中，我们选择两组方案，组大小为4。该特性平均实现4.9 IPC f(机器前端每个周期获取的指令数)。此外，我们扩展了GBP，以实现两个获取单元的两个缓存线预测。该方案包含2048个分组，分组大小为4，平均可以产生8.4 IPC f，性能比原来的2组GBPs提高约66.5%。增加的硬件成本(41.5 k比特)不到40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Two cache lines prediction for a wide-issue micro-architecture

Modern micro-architectures employ superscalar techniques to enhance system performance. The superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. In this paper, we propose the Grouped Branch Prediction (GBP) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the GBP with different group sizes are simulated. The simulation results show that the branch penalty of the group size 4 with 2048-entry is under 0.65 clock cycle. In our design, we choose the two-group scheme with group size 4. This feature achieves an average of 4.9 IPC f (the number of instructions fetched per cycle for a machine front-end). Furthermore, we extend the GBP to achieve two cache lines predictions with two fetch units. The scheme of the 2048-entry 2-group with group size 4 can produce an average of 8.4 IPC f. The performance is approximately 66.5% better than the original 2-group GBPs. The added hardware cost (41.5 k bits) is less than 40%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001

自引率

0.00%

发文量

期刊最新文献

The SawMill framework for virtual memory diversity Stacking them up: a comparison of virtual machines Adaptive interfacing with reconfigurable computers DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors Performance evaluation of a partial retraining scheme for defective multi-layer neural networks