基于重用距离分析的多核处理器扩展研究

Meng-Ju Wu, Minshu Zhao, D. Yeung
{"title":"基于重用距离分析的多核处理器扩展研究","authors":"Meng-Ju Wu, Minshu Zhao, D. Yeung","doi":"10.1145/2485922.2485965","DOIUrl":null,"url":null,"abstract":"The trend for multicore processors is towards increasing numbers of cores, with 100s of cores--i.e. large-scale chip multiprocessors (LCMPs)--possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Studying multicore processor scaling via reuse distance analysis\",\"authors\":\"Meng-Ju Wu, Minshu Zhao, D. Yeung\",\"doi\":\"10.1145/2485922.2485965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The trend for multicore processors is towards increasing numbers of cores, with 100s of cores--i.e. large-scale chip multiprocessors (LCMPs)--possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB.\",\"PeriodicalId\":20555,\"journal\":{\"name\":\"Proceedings of the 40th Annual International Symposium on Computer Architecture\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2485922.2485965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

摘要

多核处理器的趋势是增加核数,有100个核。大规模芯片多处理器(LCMPs)——未来可能实现。实现lcmp潜力的关键是缓存层次结构,因此研究内存性能如何扩展是至关重要的。重用距离(RD)分析可以帮助架构师做到这一点。特别是,最近的工作开发了并发重用距离(CRD)和私有重用距离(PRD)概要文件,以支持对共享缓存和私有缓存的分析。此外,还开发了一些技术来预测跨问题大小和核心数量的配置文件,从而能够分析太大而无法模拟的配置。本文应用RD分析方法研究了多核缓存层次结构的可扩展性。我们提出了一个基于CRD和PRD概况的框架,用于推理核心计数和问题缩放的局部性影响。我们发现基于干扰的局部退化比基于共享的局部退化更为显著。对于256核运行的小问题,前者发生在较小的缓存大小,允许适度扩展多核缓存的容量,以实现与单核缓存相同的缓存性能(MPKI)。在非常大的问题中,基于干扰的局部退化在我们的许多基准中显着增加。对于共享缓存,这使得我们的大多数基准测试无法在256 MB容量预算内实现恒定的mpki扩展;对于私有缓存,所有基准测试都无法在256 MB内实现恒定的mpki缩放。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Studying multicore processor scaling via reuse distance analysis
The trend for multicore processors is towards increasing numbers of cores, with 100s of cores--i.e. large-scale chip multiprocessors (LCMPs)--possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AC-DIMM: associative computing with STT-MRAM Deconfigurable microprocessor architectures for silicon debug acceleration Thin servers with smart pipes: designing SoC accelerators for memcached An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1