Performance enhancement in shared-memory multiprocessors using dynamically classified sharing information

Nilufa Ferdous, Byeong Kil Lee, E. John
{"title":"Performance enhancement in shared-memory multiprocessors using dynamically classified sharing information","authors":"Nilufa Ferdous, Byeong Kil Lee, E. John","doi":"10.1109/PCCC.2014.7017063","DOIUrl":null,"url":null,"abstract":"Advances in process technology has enabled the integration of many cores on a single die. The advent of many core systems has led to a commensurate increase in cache coherence complexity. As a solution to this problem, researches have proposed directory based protocols, which are scalable alternatives to snoop-based protocols. Although write-invalidation based directory protocols enhance the performance of large-scale multiprocessors, coherence misses are intrinsic impediments in such systems. Write-update protocols were proposed as a means to reduce these coherence misses. However, previous researches have shown that pure write-update protocol is highly undesirable because of the heavy traffic caused by the aggressive updates. In order to remedy these limitations, we propose a performance-aware mechanism which dynamically classifies the sharers of each cache block, either as a weak-sharing-group or an efficient-sharing-group and exploit this dynamic classification as a metric for seamless dynamic adaptation between write-invalidate and write-update strategy on a per block basis. Exploitation of the dynamic adaptation of the protocol, based on the sharing-group speculation, reduces unnecessary accesses to the shared last level cache and hence reduces the traffic caused by coherence misses and directory accesses. Simulation results on a 64-core CMP show that our proposed method can achieve 15 % (average) speedup over the baseline directory-based MOESI cache coherence protocol with PARSEC workloads. Our proposed work also reduces the L1 cache miss rate by 17 %(average). The network traffic caused by directory accesses and L1 read misses are also reduced by 16% and 17% respectively.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Advances in process technology has enabled the integration of many cores on a single die. The advent of many core systems has led to a commensurate increase in cache coherence complexity. As a solution to this problem, researches have proposed directory based protocols, which are scalable alternatives to snoop-based protocols. Although write-invalidation based directory protocols enhance the performance of large-scale multiprocessors, coherence misses are intrinsic impediments in such systems. Write-update protocols were proposed as a means to reduce these coherence misses. However, previous researches have shown that pure write-update protocol is highly undesirable because of the heavy traffic caused by the aggressive updates. In order to remedy these limitations, we propose a performance-aware mechanism which dynamically classifies the sharers of each cache block, either as a weak-sharing-group or an efficient-sharing-group and exploit this dynamic classification as a metric for seamless dynamic adaptation between write-invalidate and write-update strategy on a per block basis. Exploitation of the dynamic adaptation of the protocol, based on the sharing-group speculation, reduces unnecessary accesses to the shared last level cache and hence reduces the traffic caused by coherence misses and directory accesses. Simulation results on a 64-core CMP show that our proposed method can achieve 15 % (average) speedup over the baseline directory-based MOESI cache coherence protocol with PARSEC workloads. Our proposed work also reduces the L1 cache miss rate by 17 %(average). The network traffic caused by directory accesses and L1 read misses are also reduced by 16% and 17% respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用动态分类共享信息增强共享内存多处理器的性能
工艺技术的进步使许多核心集成在一个模具上成为可能。许多核心系统的出现导致了缓存一致性复杂性的相应增加。为了解决这一问题,研究人员提出了基于目录的协议,它是基于窥探协议的可扩展替代品。尽管基于写无效的目录协议提高了大规模多处理器的性能,但一致性缺失是这类系统的内在障碍。提出了写更新协议作为减少这些一致性缺失的手段。然而,先前的研究表明,纯写更新协议是非常不可取的,因为激进的更新会导致大量的流量。为了弥补这些限制,我们提出了一种性能感知机制,该机制将每个缓存块的共享者动态分类为弱共享组或有效共享组,并利用这种动态分类作为一个指标,在每个块的基础上无缝地动态适应写无效和写更新策略。利用基于共享组推测的协议动态适应,减少了对共享最后一级缓存的不必要访问,从而减少了由于一致性丢失和目录访问引起的流量。在64核CMP上的仿真结果表明,与PARSEC工作负载下基于目录的MOESI缓存一致性协议相比,我们提出的方法可以实现15%(平均)的加速。我们提出的工作还将L1缓存丢失率降低了17%(平均)。目录访问和L1读失败导致的网络流量也分别降低了16%和17%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance and energy evaluation of RESTful web services in Raspberry Pi Proximity-driven social interactions and their impact on the throughput scaling of wireless networks POLA: A privacy-preserving protocol for location-based real-time advertising Replica placement in content delivery networks with stochastic demands and M/M/1 servers Combinatorial JPT based on orthogonal beamforming for two-cell cooperation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1