Memory latency-tolerance approaches for Itanium processors: out-of-order execution vs. speculative precomputation

P. Wang, Hong Wang, Jamison D. Collins, Edward T. Grochowski, R. Kling, John Paul Shen
{"title":"Memory latency-tolerance approaches for Itanium processors: out-of-order execution vs. speculative precomputation","authors":"P. Wang, Hong Wang, Jamison D. Collins, Edward T. Grochowski, R. Kling, John Paul Shen","doi":"10.1109/HPCA.2002.995709","DOIUrl":null,"url":null,"abstract":"The performance of in-order execution Itanium/sup TM/ processors can suffer significantly due to cache misses. Two memory latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (OOO) execution core; the other assumes multithreading support and exploits cache prefetching via speculative precomputation (SP). This paper evaluates and contrasts these two approaches. In addition, this paper assesses the effectiveness of combining the two approaches. For a select set of memory-intensive programs, an in-order SMT Itanium processor using speculative precomputation can achieve performance improvement (92%) comparable to that of an out-of-order design (87%). Applying both 000 and SP yields a total performance improvement of 141% over the baseline in-order machine. OOO tends to be effective in prefetching-for L1 misses; whereas SP is primarily good at covering L2 and L3 misses. Our analysis indicates that the two approaches can be redundant or complementary depending on the type of delinquent loads that each targets. Both approaches are effective on delinquent loads in the loop body; however only SP is effective on delinquent loads found in loop control code.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2002.995709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 39

Abstract

The performance of in-order execution Itanium/sup TM/ processors can suffer significantly due to cache misses. Two memory latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (OOO) execution core; the other assumes multithreading support and exploits cache prefetching via speculative precomputation (SP). This paper evaluates and contrasts these two approaches. In addition, this paper assesses the effectiveness of combining the two approaches. For a select set of memory-intensive programs, an in-order SMT Itanium processor using speculative precomputation can achieve performance improvement (92%) comparable to that of an out-of-order design (87%). Applying both 000 and SP yields a total performance improvement of 141% over the baseline in-order machine. OOO tends to be effective in prefetching-for L1 misses; whereas SP is primarily good at covering L2 and L3 misses. Our analysis indicates that the two approaches can be redundant or complementary depending on the type of delinquent loads that each targets. Both approaches are effective on delinquent loads in the loop body; however only SP is effective on delinquent loads found in loop control code.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Itanium处理器的内存延迟容忍方法:乱序执行与推测性预计算
由于缓存丢失,按顺序执行的Itanium/sup TM/处理器的性能可能会受到严重影响。两种内存延迟容忍方法可以应用于Itanium处理器。一个使用乱序(OOO)执行核心;另一种假设支持多线程,并通过推测预计算(SP)利用缓存预取。本文对这两种方法进行了评价和对比。此外,本文还评估了两种方法相结合的有效性。对于一组选定的内存密集型程序,使用推测预计算的有序SMT Itanium处理器可以实现与无序设计(87%)相当的性能改进(92%)。同时应用000和SP,总性能比基准有序机器提高141%。OOO在预取中往往是有效的——对于L1缺失;而SP主要擅长于补上L2和L3失误。我们的分析表明,这两种方法可以是冗余的,也可以是互补的,这取决于每个目标的拖欠负荷的类型。两种方法都能有效地处理环体内的逾期荷载;然而,只有SP对在循环控制代码中发现的不良负载有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors Tuning garbage collection in an embedded Java environment Power issues related to branch prediction Using internal redundant representations and limited bypass to support pipelined adders and register files Modeling value speculation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1