DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI:10.1109/PACT.2011.21

Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, N. Honarmand, S. Adve, Vikram S. Adve, N. Carter, Ching-Tsun Chou

{"title":"DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism","authors":"Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, N. Honarmand, S. Adve, Vikram S. Adve, N. Carter, Ching-Tsun Chou","doi":"10.1109/PACT.2011.21","DOIUrl":null,"url":null,"abstract":"For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban \"wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities exposed by such disciplined software models. Such a co-designed effort is more likely to achieve many-core scalability than a software-oblivious hardware evolution. This paper presents DeNovo, a hardware architecture motivated by these observations. We show how a disciplined parallel programming model greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture. The DeNovo coherence protocol is simple because it eliminates transient states -- verification using model checking shows 15X fewer reachable states than a state-of-the-art implementation of the conventional MESI protocol. The DeNovo protocol is also more extensible. Adding two sophisticated optimizations, flexible communication granularity and direct cache-to-cache transfers, did not introduce additional protocol states (unlike MESI). Finally, DeNovo shows better cache hit rates and network traffic, translating to better performance and energy. Overall, a disciplined shared-memory programming model allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"173","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 173

Abstract

For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities exposed by such disciplined software models. Such a co-designed effort is more likely to achieve many-core scalability than a software-oblivious hardware evolution. This paper presents DeNovo, a hardware architecture motivated by these observations. We show how a disciplined parallel programming model greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture. The DeNovo coherence protocol is simple because it eliminates transient states -- verification using model checking shows 15X fewer reachable states than a state-of-the-art implementation of the conventional MESI protocol. The DeNovo protocol is also more extensible. Adding two sophisticated optimizations, flexible communication granularity and direct cache-to-cache transfers, did not introduce additional protocol states (unlike MESI). Finally, DeNovo shows better cache hit rates and network traffic, translating to better performance and energy. Overall, a disciplined shared-memory programming model allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DeNovo:重新思考有纪律并行的内存层次结构

为了让并行性对大量程序员来说变得易于处理，共享内存语言和环境必须发展到强制执行有纪律的实践，以禁止“野蛮的共享内存行为”，例如，非结构化并行、任意数据竞争和无处不在的非确定性。对于硬件设计师来说，这种软件进化是一个难得的机会，可以从头开始重新思考硬件，从而利用这种有纪律的软件模型所暴露的机会。与软件无关的硬件进化相比，这种共同设计的努力更有可能实现多核可伸缩性。本文介绍了DeNovo，这是一种基于这些观察的硬件架构。我们展示了一个有纪律的并行编程模型如何极大地简化了缓存一致性和一致性，同时实现了更有效的通信和缓存架构。DeNovo相干协议很简单，因为它消除了瞬态，使用模型检查验证显示的可达状态比传统MESI协议的最先进实现少15倍。DeNovo协议也具有更强的可扩展性。添加两个复杂的优化，灵活的通信粒度和直接缓存到缓存传输，并没有引入额外的协议状态(与MESI不同)。最后，DeNovo显示出更好的缓存命中率和网络流量，转化为更好的性能和能源。总的来说，规范的共享内存编程模型允许DeNovo在全局地址空间内无缝集成类似消息传递的交互，从而提高设计复杂性、性能和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 International Conference on Parallel Architectures and Compilation Techniques

自引率

0.00%

发文量