The performance impact of block sizes and fetch strategies

S. Przybylski
{"title":"The performance impact of block sizes and fetch strategies","authors":"S. Przybylski","doi":"10.1145/325164.325135","DOIUrl":null,"url":null,"abstract":"The interactions between a cache's block size, fetch size, and fetch policy from the perspective of maximizing system-level performance are explored. It has been previously noted that, given a simple fetch strategy, the performance optimal block size is almost always four or eight words. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of 2 greater than the data cache fetch size, which in turn should be equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"108","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/325164.325135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 108

Abstract

The interactions between a cache's block size, fetch size, and fetch policy from the perspective of maximizing system-level performance are explored. It has been previously noted that, given a simple fetch strategy, the performance optimal block size is almost always four or eight words. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of 2 greater than the data cache fetch size, which in turn should be equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.<>
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
块大小和获取策略对性能的影响
从最大化系统级性能的角度探讨了缓存的块大小、取大小和取策略之间的相互作用。前面已经注意到,给定一个简单的获取策略,性能最优的块大小几乎总是四个或八个字。如果与更长的块或读取相关的周期时间损失很小,那么性能最佳大小就会明显降低。在分割缓存组织中,指令和数据缓存的取值和块大小都是独立的设计变量,指令缓存块大小和取值大小应该相同。对于本跟踪驱动模拟研究中使用的工作负载和回写策略,指令缓存块大小应该是数据缓存读取大小的2倍左右,而数据缓存读取大小又应该等于或两倍于数据缓存块大小。最简单的取操作策略是只在未取的情况下取操作,并暂停CPU直到取操作完成。由于有限的内存资源和缓存缺失的强时间聚类,复杂的读取策略并不能产生相应的缺失率降低所表明的性能改进。对于模拟的环境,最有效的获取策略比上面描述的最简单的策略提高了1.7%到4.5%的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog VAX vector architecture Weak ordering-a new definition The directory-based cache coherence protocol for the DASH multiprocessor Dynamic processor allocation in hypercube computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1