Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies

R. Natarajan, Mainak Chaudhuri
{"title":"Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies","authors":"R. Natarajan, Mainak Chaudhuri","doi":"10.1109/IISWC.2013.6704665","DOIUrl":null,"url":null,"abstract":"Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chip-multiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC In this paper, we characterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharing-awareness. The oracle analysis shows that introducing sharing-awareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2013.6704665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chip-multiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC In this paper, we characterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharing-awareness. The oracle analysis shows that introducing sharing-awareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
描述多线程应用程序的特征,以设计共享感知的最后一级缓存替换策略
近年来出现了大量关于管理芯片多处理器(cmp)的共享最后一级缓存(LLC)的建议。然而,这些建议中的大多数主要关注于减少多编程工作负载中竞争的独立线程之间的破坏性干扰。虽然这些研究中很少评估关于共享内存多线程应用程序的建议策略,但它们并没有改善LLC中建设性的跨线程数据共享。在本文中,我们描述了一组来自PARSEC、SPEC OMP和splash2套件的多线程应用程序,目的是在LLC替换策略中引入共享意识。我们通过量化共享区块和私有区块对这些应用程序中LLC命中总量的潜在贡献来激励我们的特征研究,并表明共享区块比私有区块更重要。接下来,我们描述了与最优策略相比,最近的提案所享有的共享意识的数量。我们设计并评估了一个通用的oracle,它可以与任何现有的策略结合使用,以量化引入共享意识所带来的潜在改进。oracle分析表明,对于4MB和8MB的LLC,引入共享感知可以将最近最少使用(least-recently-used, LRU)策略导致的LLC失败数量平均分别减少6%和10%。这个预言的现实实现要求LLC控制器具有准确预测的能力,当一个块被填充到LLC中时,该块在LLC驻留期间是否会被共享。我们探索了基于填充地址和触发填充指令的程序计数器设计这样一个预测器的可行性。我们对使用块地址和程序计数器的两个基于历史的填充时间预测器的共享行为可预测性研究得出结论,使用此类预测器实现可接受的精度水平将需要其他与LLC块的主动共享阶段具有强相关性的架构和/或高级程序语义特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pannotia: Understanding irregular GPGPU graph applications Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench Power and performance of GPU-accelerated systems: A closer look Hardware-independent application characterization Performance implications of System Management Mode
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1