Processor Aware Anticipatory Prefetching in Loops

Spiros Kalogeropulos, M. Rajagopalan, V. Rao, Yonghong Song, P. Tirumalai
{"title":"Processor Aware Anticipatory Prefetching in Loops","authors":"Spiros Kalogeropulos, M. Rajagopalan, V. Rao, Yonghong Song, P. Tirumalai","doi":"10.1109/HPCA.2004.10029","DOIUrl":null,"url":null,"abstract":"As microprocessor speeds increase, a large fraction of the execution time is often lost to cache miss penalties. This loss can be particularly severe in processors such as the UltraSPARC-IIICu which have in-order execution and block on cache misses. Such processors rely greatly on the compiler to reduce stalls and achieve high performance. This paper describes a compiler technique for software prefetching that is aware of the specific prefetch behaviors of the target processor. The implementation targets loops containing control-flow and strided or irregular memory access patterns. A two phase locality analysis, capable of handling complex subscript expressions, is used for enhanced identification of prefetch candidates. Prefetch instructions are scheduled with careful consideration of the prefetch behaviors in the target system. Compared to a previous implementation, our technique produced performance improvements of 9% on the geometric mean, and up to 44% on individual tests, in Sun’s first UltraSPARC-IIICu based SPEC CPU2000 submission [5] and has been used in all later submissions to date.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2004.10029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

As microprocessor speeds increase, a large fraction of the execution time is often lost to cache miss penalties. This loss can be particularly severe in processors such as the UltraSPARC-IIICu which have in-order execution and block on cache misses. Such processors rely greatly on the compiler to reduce stalls and achieve high performance. This paper describes a compiler technique for software prefetching that is aware of the specific prefetch behaviors of the target processor. The implementation targets loops containing control-flow and strided or irregular memory access patterns. A two phase locality analysis, capable of handling complex subscript expressions, is used for enhanced identification of prefetch candidates. Prefetch instructions are scheduled with careful consideration of the prefetch behaviors in the target system. Compared to a previous implementation, our technique produced performance improvements of 9% on the geometric mean, and up to 44% on individual tests, in Sun’s first UltraSPARC-IIICu based SPEC CPU2000 submission [5] and has been used in all later submissions to date.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
循环中处理器感知的预期预取
随着微处理器速度的提高,执行时间的很大一部分通常会因为缓存丢失而损失。在ultrasparc - iii等处理器中,这种损失可能特别严重,因为它们按顺序执行,并且在缓存丢失时阻塞。这样的处理器在很大程度上依赖于编译器来减少延迟并实现高性能。本文描述了一种能够感知目标处理器特定预取行为的软件预取编译技术。实现的目标是包含控制流和跨行或不规则内存访问模式的循环。一种能够处理复杂下标表达式的两阶段局部性分析用于增强预取候选对象的识别。预取指令是在仔细考虑目标系统中的预取行为的情况下调度的。与之前的实现相比,我们的技术在几何平均值上提高了9%,在单个测试中提高了44%,在Sun的第一个基于UltraSPARC-IIICu的SPEC CPU2000提交中[5],并已用于迄今为止所有后来的提交。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wavelet analysis for microprocessor design: experiences with wavelet-based dI/dt characterization Hardware Support for Prescient Instruction Prefetch Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor Reducing branch misprediction penalty via selective branch recovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1