Hakbeom Jang, Yongjun Lee, JongWon Kim, Youngsok Kim, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee
{"title":"Efficient footprint caching for Tagless DRAM Caches","authors":"Hakbeom Jang, Yongjun Lee, JongWon Kim, Youngsok Kim, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee","doi":"10.1109/HPCA.2016.7446068","DOIUrl":null,"url":null,"abstract":"Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless DRAM Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and in-package DRAM, which are a major scalability bottleneck for future multi-gigabyte DRAM caches. However, TDC imposes a constraint on DRAM cache block size to be the same as OS page size (e.g., 4KB) as it takes a unified approach to address translation and cache tag management. Caching at a page granularity, or page-based caching, incurs significant off-package DRAM bandwidth waste by over-fetching blocks within a page that are not actually used. Footprint caching is an effective solution to this problem, which fetches only those blocks that will likely be touched during the page's lifetime in the DRAM cache, referred to as the page's footprint. In this paper we demonstrate TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme. Since there are no cache tags in TDC, the footprints of cached pages are tracked at TLB, instead of cache tag array, to incur much lower on-die storage overhead than the original design. Besides, when a cached page is evicted, its footprint will be stored in the corresponding page table entry, instead of an auxiliary on-die structure (i.e., Footprint History Table), to prevent footprint thrashing among different pages, thus yielding higher accuracy in footprint prediction. The resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency. Our evaluation with 3D Through-Silicon-Via-based in-package DRAM demonstrates an average reduction of off-package bandwidth by 32.0%, which, in turn, improves IPC and EDP by 17.7% and 25.4%, respectively, over the state-of-the-art TDC with no footprint caching.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37
Abstract
Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless DRAM Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and in-package DRAM, which are a major scalability bottleneck for future multi-gigabyte DRAM caches. However, TDC imposes a constraint on DRAM cache block size to be the same as OS page size (e.g., 4KB) as it takes a unified approach to address translation and cache tag management. Caching at a page granularity, or page-based caching, incurs significant off-package DRAM bandwidth waste by over-fetching blocks within a page that are not actually used. Footprint caching is an effective solution to this problem, which fetches only those blocks that will likely be touched during the page's lifetime in the DRAM cache, referred to as the page's footprint. In this paper we demonstrate TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme. Since there are no cache tags in TDC, the footprints of cached pages are tracked at TLB, instead of cache tag array, to incur much lower on-die storage overhead than the original design. Besides, when a cached page is evicted, its footprint will be stored in the corresponding page table entry, instead of an auxiliary on-die structure (i.e., Footprint History Table), to prevent footprint thrashing among different pages, thus yielding higher accuracy in footprint prediction. The resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency. Our evaluation with 3D Through-Silicon-Via-based in-package DRAM demonstrates an average reduction of off-package bandwidth by 32.0%, which, in turn, improves IPC and EDP by 17.7% and 25.4%, respectively, over the state-of-the-art TDC with no footprint caching.
高效的缓存标签管理是大型封装内DRAM缓存的主要设计目标。最近,无标签DRAM缓存(tdc)被提出,以完全消除片上SRAM和封装内DRAM中的标签结构,这是未来千兆字节DRAM缓存的主要可扩展性瓶颈。然而,TDC对DRAM缓存块大小施加了约束,要求与操作系统页面大小相同(例如,4KB),因为它采用统一的方法来进行地址转换和缓存标签管理。以页面粒度进行缓存或基于页面的缓存,会因为在页面中过度抓取没有实际使用的块而导致大量的封装外DRAM带宽浪费。内存占用缓存是解决这个问题的有效方法,它只获取在内存缓存中页面生命周期内可能被触及的块,称为页面占用。在本文中,我们证明了TDC为实现高效的内存占用缓存提供了独特的机会,与原始的内存占用缓存方案相比,它具有更高的预测精度和更低的硬件成本。由于在TDC中没有缓存标签,缓存页面的占用在TLB中跟踪,而不是在缓存标签数组中跟踪,从而产生比原始设计低得多的片内存储开销。此外,当一个缓存页被驱逐时,它的内存占用将存储在相应的页表项中,而不是存储在一个辅助的on-die结构(即footprint History table)中,以防止不同页面之间的内存占用波动,从而提高内存占用预测的准确性。由此产生的设计,称为足迹增强无标签DRAM缓存(F-TDC),显着提高了TDC的带宽效率,从而提高了其性能和能源效率。我们对3D through - silicon - via封装内DRAM的评估表明,与没有占用空间缓存的最先进的TDC相比,封装外带宽平均减少了32.0%,这反过来又使IPC和EDP分别提高了17.7%和25.4%。