一种完全关联的、无标签的DRAM缓存

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2749469.2750383

Yongjun Lee, JongWon Kim, Hakbeom Jang, Hyunggyun Yang, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee

{"title":"一种完全关联的、无标签的DRAM缓存","authors":"Yongjun Lee, JongWon Kim, Hakbeom Jang, Hyunggyun Yang, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee","doi":"10.1145/2749469.2750383","DOIUrl":null,"url":null,"abstract":"This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"19 1","pages":"211-222"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":"{\"title\":\"A fully associative, tagless DRAM cache\",\"authors\":\"Yongjun Lee, JongWon Kim, Hakbeom Jang, Hyunggyun Yang, Jang-Hyun Kim, Jinkyu Jeong, Jae W. Lee\",\"doi\":\"10.1145/2749469.2750383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.\",\"PeriodicalId\":6878,\"journal\":{\"name\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"19 1\",\"pages\":\"211-222\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"75\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2749469.2750383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

摘要

本文介绍了一种用于大型封装内DRAM缓存的无标签缓存架构。传统的模堆式DRAM缓存具有TLB和缓存标签阵列，它们分别负责虚拟到物理和物理到缓存的地址转换。我们建议将缓存粒度与操作系统页面大小保持一致，并采用统一的方法来解决翻译和缓存标签管理问题。为此，我们引入缓存映射TLB (cTLB)，它存储虚拟到缓存的地址映射，而不是虚拟到物理的地址映射。在TLB丢失时，如果请求的块尚未缓存，则TLB丢失处理程序将其分配到缓存中，并使用虚拟到缓存的地址映射更新页表和cTLB。假设大型封装内DRAM缓存的可用性，这确保了对TLB范围内的内存区域的访问总是在缓存中以低延迟命中，因为TLB访问立即返回缓存中所请求块的确切位置，从而节省了标签检查操作。剩余的缓存空间用作最近从cTLB中驱逐的内存页的受害者缓存。通过从片上SRAM或封装DRAM中完全消除缓存标签管理的数据结构，所提出的DRAM缓存实现了最佳的可扩展性和命中延迟，同时保持了全关联缓存的高命中率。我们对基于3D通硅孔(TSV)的封装DRAM的评估表明，与没有DRAM缓存的基准相比，所提出的缓存分别提高了30.9%和39.5%的IPC和能源效率。由于缓存标签的低命中延迟和零能量浪费，与需要兆字节片上SRAM存储的不切实际的SRAM标签缓存相比，这些数字转化为4.3%和23.8%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A fully associative, tagless DRAM cache

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-to-cache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D Through-Silicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor