Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2872887.2750411

Lluc Alvarez, L. Vilanova, Miquel Moretó, Marc Casas, Marc González, X. Martorell, N. Navarro, E. Ayguadé, M. Valero

{"title":"Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures","authors":"Lluc Alvarez, L. Vilanova, Miquel Moretó, Marc Casas, Marc González, X. Martorell, N. Navarro, E. Ayguadé, M. Valero","doi":"10.1145/2872887.2750411","DOIUrl":null,"url":null,"abstract":"The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a hybrid memory system. Scratchpad memories are more power-efficient than caches and they do not generate coherence traffic, but they suffer from poor programmability. A good way to hide the programmability difficulties to the programmer is to give the compiler the responsibility of generating code to manage the scratchpad memories. Unfortunately, compilers do not succeed in generating this code in the presence of random memory accesses with unknown aliasing hazards. This paper proposes a coherence protocol for the hybrid memory system that allows the compiler to always generate code to manage the scratchpad memories. In coordination with the compiler, memory accesses that may access stale copies of data are identified and diverted to the valid copy of the data. The proposal allows the architecture to be exposed to the programmer as a shared memory manycore, maintaining the programming simplicity of shared memory models and preserving backwards compatibility. In a 64-core manycore, the coherence protocol adds overheads of 4% in performance, 8% in network traffic and 9% in energy consumption to enable the usage of the hybrid memory system that, compared to a cache-based system, achieves a speedup of 1.14x and reduces on-chip network traffic and energy consumption by 29% and 17%, respectively.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"125 1","pages":"720-732"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872887.2750411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a hybrid memory system. Scratchpad memories are more power-efficient than caches and they do not generate coherence traffic, but they suffer from poor programmability. A good way to hide the programmability difficulties to the programmer is to give the compiler the responsibility of generating code to manage the scratchpad memories. Unfortunately, compilers do not succeed in generating this code in the presence of random memory accesses with unknown aliasing hazards. This paper proposes a coherence protocol for the hybrid memory system that allows the compiler to always generate code to manage the scratchpad memories. In coordination with the compiler, memory accesses that may access stale copies of data are identified and diverted to the valid copy of the data. The proposal allows the architecture to be exposed to the programmer as a shared memory manycore, maintaining the programming simplicity of shared memory models and preserving backwards compatibility. In a 64-core manycore, the coherence protocol adds overheads of 4% in performance, 8% in network traffic and 9% in energy consumption to enable the usage of the hybrid memory system that, compared to a cache-based system, achieves a speedup of 1.14x and reduces on-chip network traffic and energy consumption by 29% and 17%, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

共享内存多核架构中刮本存储器透明管理的一致性协议

多核体系结构中内核数量的增加会导致内存子系统出现严重的功耗和可伸缩性问题。一种解决方案是在高速缓存层次结构旁边引入刮板存储器，形成混合存储器系统。Scratchpad存储器比缓存更节能，它们不会产生相干流量，但它们的可编程性很差。向程序员隐藏可编程性困难的一个好方法是让编译器负责生成代码来管理临时存储器。不幸的是，在存在未知混叠风险的随机内存访问时，编译器无法成功生成此代码。本文提出了一种用于混合存储系统的一致性协议，该协议允许编译器始终生成代码来管理临时存储器。在与编译器的协调下，识别可能访问过期数据副本的内存访问，并将其转移到数据的有效副本。该建议允许将体系结构作为共享内存多核公开给程序员，维护共享内存模型的编程简单性并保持向后兼容性。在64核多核中，相干协议增加了4%的性能开销、8%的网络流量和9%的能耗，从而使混合内存系统的使用与基于缓存的系统相比，实现了1.14倍的加速，并将片上网络流量和能耗分别降低了29%和17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor