Efficient footprint caching for Tagless DRAM Caches

Hakbeom Jang, Yongjun Lee, JongWon Kim, Youngsok Kim, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee
{"title":"Efficient footprint caching for Tagless DRAM Caches","authors":"Hakbeom Jang, Yongjun Lee, JongWon Kim, Youngsok Kim, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee","doi":"10.1109/HPCA.2016.7446068","DOIUrl":null,"url":null,"abstract":"Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless DRAM Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and in-package DRAM, which are a major scalability bottleneck for future multi-gigabyte DRAM caches. However, TDC imposes a constraint on DRAM cache block size to be the same as OS page size (e.g., 4KB) as it takes a unified approach to address translation and cache tag management. Caching at a page granularity, or page-based caching, incurs significant off-package DRAM bandwidth waste by over-fetching blocks within a page that are not actually used. Footprint caching is an effective solution to this problem, which fetches only those blocks that will likely be touched during the page's lifetime in the DRAM cache, referred to as the page's footprint. In this paper we demonstrate TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme. Since there are no cache tags in TDC, the footprints of cached pages are tracked at TLB, instead of cache tag array, to incur much lower on-die storage overhead than the original design. Besides, when a cached page is evicted, its footprint will be stored in the corresponding page table entry, instead of an auxiliary on-die structure (i.e., Footprint History Table), to prevent footprint thrashing among different pages, thus yielding higher accuracy in footprint prediction. The resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency. Our evaluation with 3D Through-Silicon-Via-based in-package DRAM demonstrates an average reduction of off-package bandwidth by 32.0%, which, in turn, improves IPC and EDP by 17.7% and 25.4%, respectively, over the state-of-the-art TDC with no footprint caching.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless DRAM Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and in-package DRAM, which are a major scalability bottleneck for future multi-gigabyte DRAM caches. However, TDC imposes a constraint on DRAM cache block size to be the same as OS page size (e.g., 4KB) as it takes a unified approach to address translation and cache tag management. Caching at a page granularity, or page-based caching, incurs significant off-package DRAM bandwidth waste by over-fetching blocks within a page that are not actually used. Footprint caching is an effective solution to this problem, which fetches only those blocks that will likely be touched during the page's lifetime in the DRAM cache, referred to as the page's footprint. In this paper we demonstrate TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme. Since there are no cache tags in TDC, the footprints of cached pages are tracked at TLB, instead of cache tag array, to incur much lower on-die storage overhead than the original design. Besides, when a cached page is evicted, its footprint will be stored in the corresponding page table entry, instead of an auxiliary on-die structure (i.e., Footprint History Table), to prevent footprint thrashing among different pages, thus yielding higher accuracy in footprint prediction. The resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency. Our evaluation with 3D Through-Silicon-Via-based in-package DRAM demonstrates an average reduction of off-package bandwidth by 32.0%, which, in turn, improves IPC and EDP by 17.7% and 25.4%, respectively, over the state-of-the-art TDC with no footprint caching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无标签DRAM缓存的高效内存占用缓存
高效的缓存标签管理是大型封装内DRAM缓存的主要设计目标。最近,无标签DRAM缓存(tdc)被提出,以完全消除片上SRAM和封装内DRAM中的标签结构,这是未来千兆字节DRAM缓存的主要可扩展性瓶颈。然而,TDC对DRAM缓存块大小施加了约束,要求与操作系统页面大小相同(例如,4KB),因为它采用统一的方法来进行地址转换和缓存标签管理。以页面粒度进行缓存或基于页面的缓存,会因为在页面中过度抓取没有实际使用的块而导致大量的封装外DRAM带宽浪费。内存占用缓存是解决这个问题的有效方法,它只获取在内存缓存中页面生命周期内可能被触及的块,称为页面占用。在本文中,我们证明了TDC为实现高效的内存占用缓存提供了独特的机会,与原始的内存占用缓存方案相比,它具有更高的预测精度和更低的硬件成本。由于在TDC中没有缓存标签,缓存页面的占用在TLB中跟踪,而不是在缓存标签数组中跟踪,从而产生比原始设计低得多的片内存储开销。此外,当一个缓存页被驱逐时,它的内存占用将存储在相应的页表项中,而不是存储在一个辅助的on-die结构(即footprint History table)中,以防止不同页面之间的内存占用波动,从而提高内存占用预测的准确性。由此产生的设计,称为足迹增强无标签DRAM缓存(F-TDC),显着提高了TDC的带宽效率,从而提高了其性能和能源效率。我们对3D through - silicon - via封装内DRAM的评估表明,与没有占用空间缓存的最先进的TDC相比,封装外带宽平均减少了32.0%,这反过来又使IPC和EDP分别提高了17.7%和25.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A low power software-defined-radio baseband processor for the Internet of Things Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing MaPU: A novel mathematical computing architecture A low-power hybrid reconfigurable architecture for resistive random-access memories PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1