MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy

Shekhar Srikantaiah, Emre Kultursay, Zhang Tao, M. Kandemir, M. J. Irwin, Yuan Xie
{"title":"MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy","authors":"Shekhar Srikantaiah, Emre Kultursay, Zhang Tao, M. Kandemir, M. J. Irwin, Yuan Xie","doi":"10.1109/HPCA.2011.5749732","DOIUrl":null,"url":null,"abstract":"Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist on the same architecture. Starting from per-core L2 and L3 cache slices as the basic design point, MorphCache alters the cache topology dynamically by merging or splitting cache slices and modifying the accessibility of different cache slice groups to different cores in a CMP. We evaluated MorphCache on a 16 core CMP on a full system simulator and found that it significantly improves both average throughput and harmonic mean of speedups of diverse multithreaded and multiprogrammed workloads. Specifically, our results show that MorphCache improves throughput of the multiprogrammed mixes by 29.9% over a topology with all-shared L2 and L3 caches and 27.9% over a topology with per core private L2 cache and shared L3 cache. In addition, we also compared MorphCache to partitioning a single shared cache at each level using promotion/insertion pseudo-partitioning (PIPP) [28] and managing per-core private cache at each level using dynamic spill receive caches (DSR) [18]. We found that MorphCache improves average throughput by 6.6% over PIPP and by 5.7% over DSR when applied to both L2 and L3 caches.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"280 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2011.5749732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 45

Abstract

Given the diverse range of application characteristics that chip multiprocessors (CMPs) need to cater to, a “one-cache-topology-fits-all” design philosophy will clearly be inadequate. In this paper, we propose MorphCache, a Reconfigurable Adaptive Multi-level Cache hierarchy. Mor-phCache dynamically tunes a multi-level cache topology in a CMP to allow significantly different cache topologies to exist on the same architecture. Starting from per-core L2 and L3 cache slices as the basic design point, MorphCache alters the cache topology dynamically by merging or splitting cache slices and modifying the accessibility of different cache slice groups to different cores in a CMP. We evaluated MorphCache on a 16 core CMP on a full system simulator and found that it significantly improves both average throughput and harmonic mean of speedups of diverse multithreaded and multiprogrammed workloads. Specifically, our results show that MorphCache improves throughput of the multiprogrammed mixes by 29.9% over a topology with all-shared L2 and L3 caches and 27.9% over a topology with per core private L2 cache and shared L3 cache. In addition, we also compared MorphCache to partitioning a single shared cache at each level using promotion/insertion pseudo-partitioning (PIPP) [28] and managing per-core private cache at each level using dynamic spill receive caches (DSR) [18]. We found that MorphCache improves average throughput by 6.6% over PIPP and by 5.7% over DSR when applied to both L2 and L3 caches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MorphCache:一个可重构的自适应多级缓存层次结构
考虑到芯片多处理器(cmp)需要满足的各种应用程序特性,“一种缓存拓扑适合所有人”的设计理念显然是不够的。在本文中,我们提出了MorphCache,一个可重构的自适应多级缓存结构。Mor-phCache动态调优CMP中的多级缓存拓扑,以允许在同一架构上存在显著不同的缓存拓扑。MorphCache从每核L2和L3缓存片作为基本设计点开始,通过合并或分割缓存片以及修改CMP中不同缓存片组对不同核心的可访问性来动态更改缓存拓扑。我们在全系统模拟器上的16核CMP上评估了MorphCache,发现它显着提高了各种多线程和多编程工作负载的平均吞吐量和调和平均速度。具体来说,我们的结果表明,MorphCache比具有全共享L2和L3缓存的拓扑结构提高了29.9%的吞吐量,比具有每核私有L2缓存和共享L3缓存的拓扑结构提高了27.9%的吞吐量。此外,我们还将MorphCache与使用提升/插入伪分区(PIPP)[28]在每个级别对单个共享缓存进行分区和使用动态溢出接收缓存(DSR)[18]在每个级别管理每核私有缓存进行了比较。我们发现,当MorphCache应用于L2和L3缓存时,它比PIPP提高了6.6%的平均吞吐量,比DSR提高了5.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Safe and efficient supervised memory systems Keynote address II: How's the parallel computing revolution going? A case for guarded power gating for multi-core processors Fg-STP: Fine-Grain Single Thread Partitioning on Multicores A quantitative performance analysis model for GPU architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1