Proximity coherence for chip multiprocessors

Nick Barrow-Williams
{"title":"Proximity coherence for chip multiprocessors","authors":"Nick Barrow-Williams","doi":"10.1145/1854273.1854293","DOIUrl":null,"url":null,"abstract":"Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors available in modern fabrication processes. While they are similar to multi-node systems, they exhibit different communication latency and storage characteristics, providing new design opportunities that were previously not feasible. Traditional cache coherence protocols, although often used in many-core designs, have been developed in the context of multi-node systems. As such, they seldom take advantage of the new possibilities that many-core architectures offer. We propose Proximity Coherence, a scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure. Such an optimization is made possible by the comparable cost of local cache accesses with the use of on-chip network resources. Coherency is maintained using lightweight graph structures embedded in the L1 caches. We compare our Proximity Coherence protocol to an existing directory-based MESI protocol using full-system simulations of a 32 core system. Our extension lowers the latency of L1 cache load misses by up to 32% while reducing the bytes transferred on the global on-chip interconnect by up to 19% for a range of parallel benchmarks. Employing Proximity Coherence provides execution time improvements of up to 13%, reduces cache hierarchy energy consumption by up to 30% and delivers a more efficient solution to the challenge of coherence in chip multiprocessors.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

Abstract

Many-core architectures provide an efficient way of harnessing the increasing numbers of transistors available in modern fabrication processes. While they are similar to multi-node systems, they exhibit different communication latency and storage characteristics, providing new design opportunities that were previously not feasible. Traditional cache coherence protocols, although often used in many-core designs, have been developed in the context of multi-node systems. As such, they seldom take advantage of the new possibilities that many-core architectures offer. We propose Proximity Coherence, a scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure. Such an optimization is made possible by the comparable cost of local cache accesses with the use of on-chip network resources. Coherency is maintained using lightweight graph structures embedded in the L1 caches. We compare our Proximity Coherence protocol to an existing directory-based MESI protocol using full-system simulations of a 32 core system. Our extension lowers the latency of L1 cache load misses by up to 32% while reducing the bytes transferred on the global on-chip interconnect by up to 19% for a range of parallel benchmarks. Employing Proximity Coherence provides execution time improvements of up to 13%, reduces cache hierarchy energy consumption by up to 30% and delivers a more efficient solution to the challenge of coherence in chip multiprocessors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
芯片多处理器的邻近相干性
多核架构为利用现代制造工艺中可用的越来越多的晶体管提供了一种有效的方法。虽然它们与多节点系统相似,但它们表现出不同的通信延迟和存储特性,从而提供了以前不可行的新设计机会。传统的缓存一致性协议虽然经常用于多核设计,但已经在多节点系统的背景下发展起来。因此,它们很少利用多核体系结构提供的新可能性。我们提出了邻近相干(Proximity Coherence)方案,在该方案中,L1负载丢失通过新的专用链接乐观地转发到附近的缓存,而不是总是通过目录结构间接转发。这种优化是通过使用片上网络资源访问本地缓存的可比较成本实现的。一致性是通过嵌入在L1缓存中的轻量级图结构来维持的。我们将我们的邻近相干协议与现有的基于目录的MESI协议进行比较,使用32核系统的全系统模拟。我们的扩展将L1缓存负载丢失的延迟降低了32%,同时在一系列并行基准测试中,将全局片上互连上传输的字节减少了19%。采用邻近一致性可将执行时间提高13%,将缓存层次能耗降低30%,并为芯片多处理器中的一致性挑战提供更有效的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reducing task creation and termination overhead in explicitly parallel programs An intra-tile cache set balancing scheme NUcache: A multicore cache organization based on Next-Use distance Towards a science of parallel programming Discovering and understanding performance bottlenecks in transactional applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1