Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

Djordje Jevdjic, Stavros Volos, B. Falsafi
{"title":"Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache","authors":"Djordje Jevdjic, Stavros Volos, B. Falsafi","doi":"10.1145/2485922.2485957","DOIUrl":null,"url":null,"abstract":"Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories --- block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip. This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache --- i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"207","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 207

Abstract

Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories --- block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip. This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache --- i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
服务器的堆叠式DRAM缓存:命中率、延迟还是带宽?是否都使用了内存占用缓存
最近的研究提倡使用大型堆叠式DRAM缓存来打破内存带宽墙。现有的DRAM缓存设计分为两类——基于块的和基于页的。前者将数据组织在传统的块中(例如64B),确保低片外带宽利用率,但在堆叠的DRAM中共同定位标签和数据,导致高查找延迟。此外,由于时间局部性差,这种设计的命中率较低。相比之下,基于页面的缓存以更大的粒度(例如,4KB页面)管理数据,允许减少标记数组开销和快速查找,并以在芯片上移动大量数据为代价来利用高空间局部性。本文介绍了一种用于服务器处理器的高效模堆叠DRAM缓存设计——Footprint Cache。Footprint Cache按页面粒度分配数据,但只识别和提取页面中在页面驻留在缓存中期间将被触摸的那些块——即页面的内存占用。通过这样做,Footprint Cache消除了与基于页面的设计相关的过多的片外流量,同时保持了高命中率、小标记数组开销和低查找延迟。周期精确的模拟结果表明,具有高达512MB Footprint Cache的16核服务器与没有die-stacked Cache的基准芯片相比,性能提高了57%。与最先进的基于块的设计相比,我们的设计将性能提高了13%,同时将堆叠DRAM的动态能量降低了24%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AC-DIMM: associative computing with STT-MRAM Deconfigurable microprocessor architectures for silicon debug acceleration Thin servers with smart pipes: designing SoC accelerators for memcached An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1