Leveraging on-chip networks for data cache migration in chip multiprocessors

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2008-10-25 DOI:10.1145/1454115.1454144

Noel Eisley, L. Peh, L. Shang

{"title":"Leveraging on-chip networks for data cache migration in chip multiprocessors","authors":"Noel Eisley, L. Peh, L. Shang","doi":"10.1145/1454115.1454144","DOIUrl":null,"url":null,"abstract":"Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.","PeriodicalId":186773,"journal":{"name":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1454115.1454144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在芯片多处理器中利用片上网络进行数据缓存迁移

最近，随着核心数量的增加，芯片多处理器(cmp)已经成为现代高性能处理器的实际设计。cmp的一个重要特性是，远程但片上的L2缓存访问比片外访问成本更低;这与早期的芯片对芯片或板对板多处理器形成对比，在这些处理器中，访问远程节点的成本与访问主内存的成本一样高。这促使片上缓存迁移作为在片上保留更多数据的一种手段。然而，以前提出的技术不能扩展到高核数:它们不能利用所有核的片上缓存，也没有可扩展的迁移机制。在本文中，我们提出了可扩展的网络内迁移技术，该技术使用嵌入在路由器微架构中的提示来引导L2缓存驱逐到任何片上核心缓存中的空闲/无效缓存插槽，而不是将其驱逐到片外。我们表明，与最先进的技术相比，我们的技术可以使片外存储器访问的数量平均减少19%，优于伪最佳迁移技术的性能。这可以用微不足道的面积开销和可管理的13.4%的流量开销来完成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Meeting points: Using thread criticality to adapt multicore hardware to parallel regions COMIC: A coherent shared memory interface for cell BE Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor Multi-mode energy management for multi-tier server clusters MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory