Scalable cache memory design for large-scale SMT architectures

M. Mudawar
{"title":"Scalable cache memory design for large-scale SMT architectures","authors":"M. Mudawar","doi":"10.1145/1054943.1054952","DOIUrl":null,"url":null,"abstract":"The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache ports to the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.","PeriodicalId":249099,"journal":{"name":"Workshop on Memory Performance Issues","volume":"60 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory Performance Issues","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1054943.1054952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache ports to the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大规模SMT架构的可扩展高速缓存设计
现有SMT和超标量处理器中的缓存层次结构设计针对延迟进行了优化,但没有针对带宽进行优化。L1数据缓存的大小在过去十年中没有扩展。取而代之的是引入了更大的统一L2和L3缓存。由于包含原则,此缓存层次结构具有很高的开销。它也有一个复杂的设计,以保持所有级别的缓存一致性。此外,这种缓存层次结构不适合未来的大规模SMT处理器,这将需要具有大量端口的高带宽指令和数据缓存。本文建议取消缓存层次结构,代之以指令和数据的一级缓存。多个指令缓存可以并行使用,以扩展指令获取带宽和总体缓存容量。一级数据缓存可以被分割成多个块交错缓存库,以并行地服务多个内存请求。通过互连将数据缓存端口与不同的缓存银行连接起来,从而增加数据缓存访问时间。本文表明大规模smt可以容忍较长的数据缓存命中时间。它还表明,较小的行缓冲区可以提高性能并减少到存储数据缓存存储器所需的端口数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Compiler-optimized usage of partitioned memories A case for multi-level main memory On the effectiveness of prefetching and reuse in reducing L1 data cache traffic: a case study of Snort SCIMA-SMP: on-chip memory processor architecture for SMP Evaluating kilo-instruction multiprocessors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1