Making STMs Cache Friendly with Compiler Transformations

Sandya Mannarswamy, R. Govindarajan
{"title":"Making STMs Cache Friendly with Compiler Transformations","authors":"Sandya Mannarswamy, R. Govindarajan","doi":"10.1109/PACT.2011.55","DOIUrl":null,"url":null,"abstract":"Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过编译器转换使stm缓存友好
软件事务性内存(STM)是一种很有前途的多线程共享内存编程范式。随着处理器和内存之间的性能差距不断扩大,为了使STM被广泛用于性能关键型软件,理解和改进运行在STM上的应用程序的缓存性能变得越来越重要。在本文中,我们对STM应用程序的缓存行为进行了迄今为止最详细的实验评估,并量化了不同STM因素对应用程序所经历的缓存缺失的影响。我们发现stm对缓存不友好,在大多数基准测试中,数据缓存失速周期占执行周期的50%以上。我们发现,平均而言,在STM内部发生的丢失占应用程序经历的总数据缓存丢失延迟周期的62%,并且由于STM本身的某些固有特征,缓存性能受到不利影响。上述观察结果促使我们提出一组特定的编译器转换,旨在使stm缓存友好。我们发现STM的细粒度和应用程序不知道的锁定是导致其糟糕的缓存行为的主要原因。因此,我们提出了选择性锁数据协同定位(LDC)和冗余锁访问移除(RLAR)来解决锁访问缺失问题。我们发现,即使是完全不连接访问并行的事务,也会因集中的全局时间戳更新而导致代价高昂的一致性丢失,因此我们提出了选择性分区时间戳(SPTS)转换来解决这个问题。我们发现,在8个STAMP应用程序中的5个中,我们的转换有效地改善了STM应用程序的缓存行为,将数据缓存丢失延迟减少了20.15%到37.14%,并将执行时间提高了18.32%到33.12%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Modeling and Performance Evaluation of TSO-Preserving Binary Optimization An Alternative Memory Access Scheduling in Manycore Accelerators DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Enhancing Data Locality for Dynamic Simulations through Asynchronous Data Transformations and Adaptive Control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1