The performance impact of block sizes and fetch strategies

[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture Pub Date : 1990-05-01 DOI:10.1145/325164.325135

S. Przybylski

{"title":"The performance impact of block sizes and fetch strategies","authors":"S. Przybylski","doi":"10.1145/325164.325135","DOIUrl":null,"url":null,"abstract":"The interactions between a cache's block size, fetch size, and fetch policy from the perspective of maximizing system-level performance are explored. It has been previously noted that, given a simple fetch strategy, the performance optimal block size is almost always four or eight words. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of 2 greater than the data cache fetch size, which in turn should be equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"108","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/325164.325135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 108

Abstract

The interactions between a cache's block size, fetch size, and fetch policy from the perspective of maximizing system-level performance are explored. It has been previously noted that, given a simple fetch strategy, the performance optimal block size is almost always four or eight words. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of 2 greater than the data cache fetch size, which in turn should be equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.<>

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

块大小和获取策略对性能的影响

从最大化系统级性能的角度探讨了缓存的块大小、取大小和取策略之间的相互作用。前面已经注意到，给定一个简单的获取策略，性能最优的块大小几乎总是四个或八个字。如果与更长的块或读取相关的周期时间损失很小，那么性能最佳大小就会明显降低。在分割缓存组织中，指令和数据缓存的取值和块大小都是独立的设计变量，指令缓存块大小和取值大小应该相同。对于本跟踪驱动模拟研究中使用的工作负载和回写策略，指令缓存块大小应该是数据缓存读取大小的2倍左右，而数据缓存读取大小又应该等于或两倍于数据缓存块大小。最简单的取操作策略是只在未取的情况下取操作，并暂停CPU直到取操作完成。由于有限的内存资源和缓存缺失的强时间聚类，复杂的读取策略并不能产生相应的缺失率降低所表明的性能改进。对于模拟的环境，最有效的获取策略比上面描述的最简单的策略提高了1.7%到4.5%的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量

期刊最新文献

Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog VAX vector architecture Weak ordering-a new definition The directory-based cache coherence protocol for the DASH multiprocessor Dynamic processor allocation in hypercube computers