Performance enhancement in shared-memory multiprocessors using dynamically classified sharing information

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) Pub Date : 2014-12-01 DOI:10.1109/PCCC.2014.7017063

Nilufa Ferdous, Byeong Kil Lee, E. John

{"title":"Performance enhancement in shared-memory multiprocessors using dynamically classified sharing information","authors":"Nilufa Ferdous, Byeong Kil Lee, E. John","doi":"10.1109/PCCC.2014.7017063","DOIUrl":null,"url":null,"abstract":"Advances in process technology has enabled the integration of many cores on a single die. The advent of many core systems has led to a commensurate increase in cache coherence complexity. As a solution to this problem, researches have proposed directory based protocols, which are scalable alternatives to snoop-based protocols. Although write-invalidation based directory protocols enhance the performance of large-scale multiprocessors, coherence misses are intrinsic impediments in such systems. Write-update protocols were proposed as a means to reduce these coherence misses. However, previous researches have shown that pure write-update protocol is highly undesirable because of the heavy traffic caused by the aggressive updates. In order to remedy these limitations, we propose a performance-aware mechanism which dynamically classifies the sharers of each cache block, either as a weak-sharing-group or an efficient-sharing-group and exploit this dynamic classification as a metric for seamless dynamic adaptation between write-invalidate and write-update strategy on a per block basis. Exploitation of the dynamic adaptation of the protocol, based on the sharing-group speculation, reduces unnecessary accesses to the shared last level cache and hence reduces the traffic caused by coherence misses and directory accesses. Simulation results on a 64-core CMP show that our proposed method can achieve 15 % (average) speedup over the baseline directory-based MOESI cache coherence protocol with PARSEC workloads. Our proposed work also reduces the L1 cache miss rate by 17 %(average). The network traffic caused by directory accesses and L1 read misses are also reduced by 16% and 17% respectively.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Advances in process technology has enabled the integration of many cores on a single die. The advent of many core systems has led to a commensurate increase in cache coherence complexity. As a solution to this problem, researches have proposed directory based protocols, which are scalable alternatives to snoop-based protocols. Although write-invalidation based directory protocols enhance the performance of large-scale multiprocessors, coherence misses are intrinsic impediments in such systems. Write-update protocols were proposed as a means to reduce these coherence misses. However, previous researches have shown that pure write-update protocol is highly undesirable because of the heavy traffic caused by the aggressive updates. In order to remedy these limitations, we propose a performance-aware mechanism which dynamically classifies the sharers of each cache block, either as a weak-sharing-group or an efficient-sharing-group and exploit this dynamic classification as a metric for seamless dynamic adaptation between write-invalidate and write-update strategy on a per block basis. Exploitation of the dynamic adaptation of the protocol, based on the sharing-group speculation, reduces unnecessary accesses to the shared last level cache and hence reduces the traffic caused by coherence misses and directory accesses. Simulation results on a 64-core CMP show that our proposed method can achieve 15 % (average) speedup over the baseline directory-based MOESI cache coherence protocol with PARSEC workloads. Our proposed work also reduces the L1 cache miss rate by 17 %(average). The network traffic caused by directory accesses and L1 read misses are also reduced by 16% and 17% respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用动态分类共享信息增强共享内存多处理器的性能

工艺技术的进步使许多核心集成在一个模具上成为可能。许多核心系统的出现导致了缓存一致性复杂性的相应增加。为了解决这一问题，研究人员提出了基于目录的协议，它是基于窥探协议的可扩展替代品。尽管基于写无效的目录协议提高了大规模多处理器的性能，但一致性缺失是这类系统的内在障碍。提出了写更新协议作为减少这些一致性缺失的手段。然而，先前的研究表明，纯写更新协议是非常不可取的，因为激进的更新会导致大量的流量。为了弥补这些限制，我们提出了一种性能感知机制，该机制将每个缓存块的共享者动态分类为弱共享组或有效共享组，并利用这种动态分类作为一个指标，在每个块的基础上无缝地动态适应写无效和写更新策略。利用基于共享组推测的协议动态适应，减少了对共享最后一级缓存的不必要访问，从而减少了由于一致性丢失和目录访问引起的流量。在64核CMP上的仿真结果表明，与PARSEC工作负载下基于目录的MOESI缓存一致性协议相比，我们提出的方法可以实现15%(平均)的加速。我们提出的工作还将L1缓存丢失率降低了17%(平均)。目录访问和L1读失败导致的网络流量也分别降低了16%和17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)

自引率

0.00%

发文量