一种无状态、面向内容的数据预取机制

ASPLOS X Pub Date : 2002-10-05 DOI:10.1145/605397.605427

Robert Cooksey, S. Jourdan, D. Grunwald

{"title":"一种无状态、面向内容的数据预取机制","authors":"Robert Cooksey, S. Jourdan, D. Grunwald","doi":"10.1145/605397.605427","DOIUrl":null,"url":null,"abstract":"Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches \"likely\" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of \"real world\" applications.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"173","resultStr":"{\"title\":\"A stateless, content-directed data prefetching mechanism\",\"authors\":\"Robert Cooksey, S. Jourdan, D. Grunwald\",\"doi\":\"10.1145/605397.605427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches \\\"likely\\\" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of \\\"real world\\\" applications.\",\"PeriodicalId\":377379,\"journal\":{\"name\":\"ASPLOS X\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"173\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS X\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/605397.605427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS X","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/605397.605427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 173

摘要

尽管中央处理器的速度在不断提高，但整体系统性能的提高越来越受到内存延迟的阻碍，特别是对于指针密集型应用程序。为了应对这种性能损失，人们提出了许多数据和指令预取机制。最近，有几个提议提出了一个内存端预取器;通常，这些预取程序涉及一个独立的处理器，该处理器执行一个程序片，该程序片将有效地预取主程序所需的数据。另一种设计包含了大型状态表，这些状态表可以学习处理器的遗漏引用行为，并尝试预取可能的遗漏。本文提出了一种利用操作系统和运行时系统使用的内存分配来提高使用现代语言系统构建的指针密集型应用程序性能的数据预取体系结构——内容导向数据预取。该技术基于保守垃圾收集建模，并预取在内存引用中观察到的“可能的”虚拟地址。这种预取机制使用应用程序的底层数据，并在不使用额外处理器状态的情况下提供11.3%的加速。通过在第二级缓存中添加不到½%的空间开销，在一系列“真实世界”应用程序中，性能可以进一步提高到12.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A stateless, content-directed data prefetching mechanism

Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches "likely" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of "real world" applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助