一种无状态、面向内容的数据预取机制

ASPLOS X Pub Date : 2002-10-05 DOI:10.1145/605397.605427
Robert Cooksey, S. Jourdan, D. Grunwald
{"title":"一种无状态、面向内容的数据预取机制","authors":"Robert Cooksey, S. Jourdan, D. Grunwald","doi":"10.1145/605397.605427","DOIUrl":null,"url":null,"abstract":"Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches \"likely\" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of \"real world\" applications.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"173","resultStr":"{\"title\":\"A stateless, content-directed data prefetching mechanism\",\"authors\":\"Robert Cooksey, S. Jourdan, D. Grunwald\",\"doi\":\"10.1145/605397.605427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches \\\"likely\\\" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of \\\"real world\\\" applications.\",\"PeriodicalId\":377379,\"journal\":{\"name\":\"ASPLOS X\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"173\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS X\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/605397.605427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS X","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/605397.605427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 173

摘要

尽管中央处理器的速度在不断提高,但整体系统性能的提高越来越受到内存延迟的阻碍,特别是对于指针密集型应用程序。为了应对这种性能损失,人们提出了许多数据和指令预取机制。最近,有几个提议提出了一个内存端预取器;通常,这些预取程序涉及一个独立的处理器,该处理器执行一个程序片,该程序片将有效地预取主程序所需的数据。另一种设计包含了大型状态表,这些状态表可以学习处理器的遗漏引用行为,并尝试预取可能的遗漏。本文提出了一种利用操作系统和运行时系统使用的内存分配来提高使用现代语言系统构建的指针密集型应用程序性能的数据预取体系结构——内容导向数据预取。该技术基于保守垃圾收集建模,并预取在内存引用中观察到的“可能的”虚拟地址。这种预取机制使用应用程序的底层数据,并在不使用额外处理器状态的情况下提供11.3%的加速。通过在第二级缓存中添加不到½%的空间开销,在一系列“真实世界”应用程序中,性能可以进一步提高到12.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A stateless, content-directed data prefetching mechanism
Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches "likely" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of "real world" applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Understanding and improving operating system effects in control flow prediction Mondrian memory protection Speculative synchronization: applying thread-level speculation to explicitly parallel applications A stream compiler for communication-exposed architectures Evolving RPC for active storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1