Re-NUCA: Boosting CMP Performance Through Block Replication

P. Foglia, C. Prete, M. Solinas, Giovanna Monni
{"title":"Re-NUCA: Boosting CMP Performance Through Block Replication","authors":"P. Foglia, C. Prete, M. Solinas, Giovanna Monni","doi":"10.1109/DSD.2010.41","DOIUrl":null,"url":null,"abstract":"Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Re-NUCA:通过块复制提高CMP性能
芯片多处理器(CMP)系统已经成为设计微处理器的参考架构,这要归功于半导体纳米技术的进步,它不断地提供了数量如新月一般的更快、更小的单片晶体管。由于能量限制和电线延迟效应,提高性能的经典技术(例如增加时钟频率和每个时钟周期执行的工作量)不再能够提供显着的改进,因此对cmp的兴趣不断增长。CMP系统通常采用在所有核心之间共享的大型最后一级缓存(LLC)(通常是L2或L3)和专用L1缓存。由于私有缓存的miss解析时间取决于LLC的响应时间,而LLC的响应时间以线延迟为主,因此线延迟会影响性能。NUCA缓存已被提议用于单核和多核系统,作为容忍线延迟对整体性能影响的机制。在本文中,我们介绍了一种新的NUCA架构,称为Re-NUCA,特别适用于(但不限于)cmp,其中内核放置在共享缓存的不同侧。这个想法是允许在共享缓存内复制共享块,以避免由于冲突命中问题而在经典D-NUCA缓存中出现的性能改进限制。我们的结果表明,Re-NUCA的性能平均优于D-NUCA 5%以上,但对于那些严重遭受冲突打击问题的应用程序,我们观察到性能提高高达15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multicore SDR Architecture for Reconfigurable WiMAX Downlink Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit Low Latency Recovery from Transient Faults for Pipelined Processor Architectures System Level Hardening by Computing with Matrices Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1