Scalable cache memory design for large-scale SMT architectures

Workshop on Memory Performance Issues Pub Date : 2004-06-20 DOI:10.1145/1054943.1054952

M. Mudawar

{"title":"Scalable cache memory design for large-scale SMT architectures","authors":"M. Mudawar","doi":"10.1145/1054943.1054952","DOIUrl":null,"url":null,"abstract":"The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache ports to the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"60 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory Performance Issues","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1054943.1054952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache ports to the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大规模SMT架构的可扩展高速缓存设计

现有SMT和超标量处理器中的缓存层次结构设计针对延迟进行了优化，但没有针对带宽进行优化。L1数据缓存的大小在过去十年中没有扩展。取而代之的是引入了更大的统一L2和L3缓存。由于包含原则，此缓存层次结构具有很高的开销。它也有一个复杂的设计，以保持所有级别的缓存一致性。此外，这种缓存层次结构不适合未来的大规模SMT处理器，这将需要具有大量端口的高带宽指令和数据缓存。本文建议取消缓存层次结构，代之以指令和数据的一级缓存。多个指令缓存可以并行使用，以扩展指令获取带宽和总体缓存容量。一级数据缓存可以被分割成多个块交错缓存库，以并行地服务多个内存请求。通过互连将数据缓存端口与不同的缓存银行连接起来，从而增加数据缓存访问时间。本文表明大规模smt可以容忍较长的数据缓存命中时间。它还表明，较小的行缓冲区可以提高性能并减少到存储数据缓存存储器所需的端口数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop on Memory Performance Issues

自引率

0.00%

发文量

期刊最新文献

Compiler-optimized usage of partitioned memories A case for multi-level main memory On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort SCIMA-SMP: on-chip memory processor architecture for SMP Evaluating kilo-instruction multiprocessors