Reaping the Benefit of Temporal Silence to Improve Communication Performance

Kevin M. Lepak, Mikko H. Lipasti
{"title":"Reaping the Benefit of Temporal Silence to Improve Communication Performance","authors":"Kevin M. Lepak, Mikko H. Lipasti","doi":"10.1109/ISPASS.2005.1430580","DOIUrl":null,"url":null,"abstract":"Communication misses - those serviced by dirty data in remote caches - are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing speculation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven simulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accurate detection of elision idioms for SLE is vitally important for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause barriers to speculation and therefore may cause SLE failures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution-driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoretically LVP can capture the greatest fraction of communication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in application benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2005.1430580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Communication misses - those serviced by dirty data in remote caches - are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing speculation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven simulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accurate detection of elision idioms for SLE is vitally important for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause barriers to speculation and therefore may cause SLE failures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution-driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoretically LVP can capture the greatest fraction of communication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in application benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从暂时的沉默中获益,提高沟通表现
通信缺失——由远程缓存中的脏数据提供服务——是共享内存多处理器中一个紧迫的性能限制因素。最近的研究表明,暂时沉默存储可以利用相干协议增强(mesi)来大幅减少这种缺失;通过使用推测来创建原子静默存储对,从而实现推测锁省略(SLE);或采用负荷值预测(LVP)。我们利用全系统、执行驱动的模拟、科学和商业工作负载来评估这三种方法,以衡量性能。我们的研究表明,SLE省略习语的准确检测对于提供稳健的性能至关重要,而对于现有的商业代码来说似乎很困难。此外,乱序核心中的常见数据路径问题会导致推测障碍,因此可能导致SLE失败,除非在微架构中添加特定于SLE的推测机制。我们还提出了新的预测和沉默检测机制,使mesi协议能够为所有工作负载提供强大的性能。最后,我们对负载值预测(LVP)进行了详细的执行驱动性能评估,这是另一种获取暂时静默存储好处的简单方法。我们表明,虽然理论上LVP可以捕获所有方法中最大比例的通信缺失,但它通常不是最有效的交付性能。这是因为试图通过推测消费者来隐藏延迟,即预测负载值,从根本上说,比通过消除存储的无效效应来消除源处的延迟更有效。应用每种方法,我们观察到应用程序基准中的性能变化范围为:增强版mesi的性能变化为1%至14%,LVP的性能变化为-1.0%至9%,增强版SLE的性能变化为-3%至9%,组合技术的性能变化为2%至21%
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors Performance Analysis of a New Packet Trace Compressor based on TCP Flow Clustering Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison A High Performance, Energy Efficient GALS ProcessorMicroarchitecture with Reduced Implementation Complexity Dataflow: A Complement to Superscalar
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1