Grep: Performance Enhancement in MultiCore Processors using an Adaptive Graph Prefetcher

Indranee Kashyap, Dipika Deb, Nityananda Sarma
{"title":"Grep: Performance Enhancement in MultiCore Processors using an Adaptive Graph Prefetcher","authors":"Indranee Kashyap, Dipika Deb, Nityananda Sarma","doi":"10.1109/ISVLSI59464.2023.10238634","DOIUrl":null,"url":null,"abstract":"Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI59464.2023.10238634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Memory latency and off-chip bandwidth have been struggling to keep up with computing performance in modern computer systems. In this regard, prefetching helps in masking the long memory access latency at various cache levels by continuously monitoring an application’s memory access pattern. Upon detecting a pattern, it prefetches cache block ahead of its use. However, complex patterns such as directed or indirected pointer access, linked lists, and so on does not adhere to any specific pattern and hence, makes prefetching impossible. The paper proposes Grep, an adaptive graph based data prefetcher that monitors L1D cache misses and prefetches block in L2 cache. Unlike state-of-the-art prefetchers, Grep does not search for patterns in the miss stream. Rather, it generates a predecessor-successor relationship among the cache misses by constructing an occurrence graph that stores the frequency and sequence of subsequent cache block accesses. Therefore, both regular and irregular patterns in the miss stream can be predicted. Upon an address match in the occurrence graph, Grep prefetches block with a confidence value. Experimentally, it improves prefetch coverage and accuracy by 35.5% and 18.8%, respectively, compared to SPP.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Grep:使用自适应图形预取器的多核处理器性能增强
内存延迟和片外带宽一直在努力跟上现代计算机系统的计算性能。在这方面,通过持续监视应用程序的内存访问模式,预取有助于掩盖各种缓存级别上的长内存访问延迟。一旦检测到一个模式,它会在使用之前预取缓存块。然而,复杂的模式,如定向或间接指针访问、链表等,并不遵循任何特定的模式,因此,使得预取不可能。本文提出了一种基于自适应图的数据预取器Grep,用于监控L1D缓存缺失和L2缓存中的预取块。与最先进的预取器不同,Grep不会在缺失流中搜索模式。相反,它通过构造一个存储后续缓存块访问的频率和顺序的发生图,在缓存失败之间生成一个前身-后继关系。因此,缺失流中的规则和不规则模式都可以预测。当出现图中的地址匹配时,Grep预取具有置信度值的块。实验结果表明,与SPP相比,该方法的预取覆盖率和预取准确率分别提高了35.5%和18.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Compact Ferroelectric 2T-(n+1)C Cell to Implement AND-OR Logic in Memory 3D-TTP: Efficient Transient Temperature-Aware Power Budgeting for 3D-Stacked Processor-Memory Systems CellFlow: Automated Standard Cell Design Flow Versatile Signal Distribution Networks for Scalable Placement and Routing of Field-coupled Nanocomputing Technologies Revisiting Trojan Insertion Techniques for Post-Silicon Trojan Detection Evaluation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1