Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses

Chun Xia, J. Torrellas
{"title":"Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses","authors":"Chun Xia, J. Torrellas","doi":"10.1145/232973.233001","DOIUrl":null,"url":null,"abstract":"High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.233001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为减少缓存缺失而优化布局的系统代码指令预取
高性能片上指令缓存对于保持快速处理器的繁忙状态至关重要。不幸的是,虽然片上缓存通常能够成功地拦截循环密集型工程代码中的指令获取,但它们在大型系统代码中却不太能够做到这一点。为了提高后一种代码的性能,可以使用编译器在内存中布局代码,以减少缓存冲突。有趣的是,这样的操作使代码处于一种可以被一种新的指令预取所利用的状态:受保护的顺序预取。其思想是编译器在代码中留下关于代码如何布局的提示。然后,在运行时,预取硬件检测到这些提示,并使用它们更有效地预取。该方案可以非常便宜地实现:在控制传输指令中编码一个比特,并使用一个预取模块,该模块需要对现有的下一行顺序预取器进行少量扩展。此外,该方案可以在运行时通过在TLB中切换一个位来关闭和打开。利用4处理器机器的完整迹线对该方案进行了仿真评估。总的来说,对于16kbyte的主指令缓存,有保护的顺序预取平均删除了操作系统中66%的指令遗漏,使操作系统的速度提高了10%。此外,该方案比现有的顺序预取技术更具成本效益和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Memory Bandwidth Limitations of Future Microprocessors Missing the Memory Wall: The Case for Processor/Memory Integration Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses STiNG: A CC-NUMA Computer System for the Commercial Marketplace High-Bandwidth Address Translation for Multiple-Issue Processors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1