MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI:10.1109/HPCA.2011.5749732

Shekhar Srikantaiah, Emre Kultursay, Zhang Tao, M. Kandemir, M. J. Irwin, Yuan Xie

{"title":"MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy","authors":"Shekhar Srikantaiah, Emre Kultursay, Zhang Tao, M. Kandemir, M. J. Irwin, Yuan Xie","doi":"10.1109/HPCA.2011.5749732","DOIUrl":null,"url":null,"abstract":"Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist on the same architecture. Starting from per-core L2 and L3 cache slices as the basic design point, MorphCache alters the cache topology dynamically by merging or splitting cache slices and modifying the accessibility of different cache slice groups to different cores in a CMP. We evaluated MorphCache on a 16 core CMP on a full system simulator and found that it significantly improves both average throughput and harmonic mean of speedups of diverse multithreaded and multiprogrammed workloads. Specifically, our results show that MorphCache improves throughput of the multiprogrammed mixes by 29.9% over a topology with all-shared L2 and L3 caches and 27.9% over a topology with per core private L2 cache and shared L3 cache. In addition, we also compared MorphCache to partitioning a single shared cache at each level using promotion/insertion pseudo-partitioning (PIPP) [28] and managing per-core private cache at each level using dynamic spill receive caches (DSR) [18]. We found that MorphCache improves average throughput by 6.6% over PIPP and by 5.7% over DSR when applied to both L2 and L3 caches.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"280 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2011.5749732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist on the same architecture. Starting from per-core L2 and L3 cache slices as the basic design point, MorphCache alters the cache topology dynamically by merging or splitting cache slices and modifying the accessibility of different cache slice groups to different cores in a CMP. We evaluated MorphCache on a 16 core CMP on a full system simulator and found that it significantly improves both average throughput and harmonic mean of speedups of diverse multithreaded and multiprogrammed workloads. Specifically, our results show that MorphCache improves throughput of the multiprogrammed mixes by 29.9% over a topology with all-shared L2 and L3 caches and 27.9% over a topology with per core private L2 cache and shared L3 cache. In addition, we also compared MorphCache to partitioning a single shared cache at each level using promotion/insertion pseudo-partitioning (PIPP) [28] and managing per-core private cache at each level using dynamic spill receive caches (DSR) [18]. We found that MorphCache improves average throughput by 6.6% over PIPP and by 5.7% over DSR when applied to both L2 and L3 caches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MorphCache:一个可重构的自适应多级缓存层次结构

考虑到芯片多处理器(cmp)需要满足的各种应用程序特性，“一种缓存拓扑适合所有人”的设计理念显然是不够的。在本文中，我们提出了MorphCache，一个可重构的自适应多级缓存结构。Mor-phCache动态调优CMP中的多级缓存拓扑，以允许在同一架构上存在显著不同的缓存拓扑。MorphCache从每核L2和L3缓存片作为基本设计点开始，通过合并或分割缓存片以及修改CMP中不同缓存片组对不同核心的可访问性来动态更改缓存拓扑。我们在全系统模拟器上的16核CMP上评估了MorphCache，发现它显着提高了各种多线程和多编程工作负载的平均吞吐量和调和平均速度。具体来说，我们的结果表明，MorphCache比具有全共享L2和L3缓存的拓扑结构提高了29.9%的吞吐量，比具有每核私有L2缓存和共享L3缓存的拓扑结构提高了27.9%的吞吐量。此外，我们还将MorphCache与使用提升/插入伪分区(PIPP)[28]在每个级别对单个共享缓存进行分区和使用动态溢出接收缓存(DSR)[18]在每个级别管理每核私有缓存进行了比较。我们发现，当MorphCache应用于L2和L3缓存时，它比PIPP提高了6.6%的平均吞吐量，比DSR提高了5.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE 17th International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量

期刊最新文献

Safe and efficient supervised memory systems Keynote address II: How's the parallel computing revolution going? A case for guarded power gating for multi-core processors Fg-STP: Fine-Grain Single Thread Partitioning on Multicores A quantitative performance analysis model for GPU architectures