Reaping the Benefit of Temporal Silence to Improve Communication Performance

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI:10.1109/ISPASS.2005.1430580

Kevin M. Lepak, Mikko H. Lipasti

{"title":"Reaping the Benefit of Temporal Silence to Improve Communication Performance","authors":"Kevin M. Lepak, Mikko H. Lipasti","doi":"10.1109/ISPASS.2005.1430580","DOIUrl":null,"url":null,"abstract":"Communication misses - those serviced by dirty data in remote caches - are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing speculation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven simulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accurate detection of elision idioms for SLE is vitally important for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause barriers to speculation and therefore may cause SLE failures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution-driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoretically LVP can capture the greatest fraction of communication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in application benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2005.1430580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Communication misses - those serviced by dirty data in remote caches - are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing speculation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven simulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accurate detection of elision idioms for SLE is vitally important for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause barriers to speculation and therefore may cause SLE failures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution-driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoretically LVP can capture the greatest fraction of communication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in application benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从暂时的沉默中获益，提高沟通表现

通信缺失——由远程缓存中的脏数据提供服务——是共享内存多处理器中一个紧迫的性能限制因素。最近的研究表明，暂时沉默存储可以利用相干协议增强(mesi)来大幅减少这种缺失;通过使用推测来创建原子静默存储对，从而实现推测锁省略(SLE);或采用负荷值预测(LVP)。我们利用全系统、执行驱动的模拟、科学和商业工作负载来评估这三种方法，以衡量性能。我们的研究表明，SLE省略习语的准确检测对于提供稳健的性能至关重要，而对于现有的商业代码来说似乎很困难。此外，乱序核心中的常见数据路径问题会导致推测障碍，因此可能导致SLE失败，除非在微架构中添加特定于SLE的推测机制。我们还提出了新的预测和沉默检测机制，使mesi协议能够为所有工作负载提供强大的性能。最后，我们对负载值预测(LVP)进行了详细的执行驱动性能评估，这是另一种获取暂时静默存储好处的简单方法。我们表明，虽然理论上LVP可以捕获所有方法中最大比例的通信缺失，但它通常不是最有效的交付性能。这是因为试图通过推测消费者来隐藏延迟，即预测负载值，从根本上说，比通过消除存储的无效效应来消除源处的延迟更有效。应用每种方法，我们观察到应用程序基准中的性能变化范围为:增强版mesi的性能变化为1%至14%，LVP的性能变化为-1.0%至9%，增强版SLE的性能变化为-3%至9%，组合技术的性能变化为2%至21%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.

自引率

0.00%

发文量