Memory-system Design Considerations For Dynamically-scheduled Processors

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI:10.1145/264107.264156

K. Farkas, P. Chow, N. Jouppi, Z. Vranesic

{"title":"Memory-system Design Considerations For Dynamically-scheduled Processors","authors":"K. Farkas, P. Chow, N. Jouppi, Z. Vranesic","doi":"10.1145/264107.264156","DOIUrl":null,"url":null,"abstract":"In this paper, we identify performance trends and design relationships between the following components of the data memory hierarchy in a dynamically-scheduled processor: the register file, the lockup-free data cache, the stream buffers, and the interface between these components and the lower levels of the memory hierarchy. Similar performance was obtained from all systems having support for fewer than four in-flight misses, irrespective of the register-file size, the issue width of the processor, and the memory bandwidth. While providing support for more than four in-flight misses did increase system performance, the improvement was less than that obtained by increasing the number of registers. The addition of stream buffers to the investigated systems led to a significant performance increase, with the larger increases for systems having less in-flight-miss support, greater memory bandwidth, or more instruction issue capability. The performance of these systems was not significantly affected by the inclusion of traffic filters, dynamic-stride calculators, or the inclusion of the per-load non-unity stride-predictor and the incremental-prefetching techniques, which we introduce. However, the incremental prefetching technique reduces the bandwidth consumed by stream buffers by 50% without a significant impact on performance.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"120","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 120

Abstract

In this paper, we identify performance trends and design relationships between the following components of the data memory hierarchy in a dynamically-scheduled processor: the register file, the lockup-free data cache, the stream buffers, and the interface between these components and the lower levels of the memory hierarchy. Similar performance was obtained from all systems having support for fewer than four in-flight misses, irrespective of the register-file size, the issue width of the processor, and the memory bandwidth. While providing support for more than four in-flight misses did increase system performance, the improvement was less than that obtained by increasing the number of registers. The addition of stream buffers to the investigated systems led to a significant performance increase, with the larger increases for systems having less in-flight-miss support, greater memory bandwidth, or more instruction issue capability. The performance of these systems was not significantly affected by the inclusion of traffic filters, dynamic-stride calculators, or the inclusion of the per-load non-unity stride-predictor and the incremental-prefetching techniques, which we introduce. However, the incremental prefetching technique reduces the bandwidth consumed by stream buffers by 50% without a significant impact on performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

动态调度处理器的内存系统设计注意事项

在本文中，我们确定了动态调度处理器中数据内存层次结构的以下组件之间的性能趋势和设计关系:寄存器文件，无锁数据缓存，流缓冲区，以及这些组件与内存层次结构较低级别之间的接口。无论寄存器文件大小、处理器的问题宽度和内存带宽如何，所有支持少于4次飞行中失误的系统都获得了类似的性能。虽然为4次以上的飞行失误提供支持确实提高了系统性能，但这种改进不如增加寄存器数量所获得的改进。将流缓冲区添加到所研究的系统中可以显著提高性能，对于具有更少的飞行失误支持、更大的内存带宽或更多指令问题能力的系统，性能的提高幅度更大。这些系统的性能不会受到流量过滤器、动态步幅计算器、每负载非统一步幅预测器和增量预取技术的显著影响。然而，增量预取技术将流缓冲区消耗的带宽减少了50%，而对性能没有显著影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊