一个记忆系统设计框架:创建智能记忆

Proceedings. International Symposium on Computer Architecture Pub Date : 2009-06-15 DOI:10.1145/1555754.1555805

A. Firoozshahian, A. Solomatnikov, Ofer Shacham, Zain Asgar, S. Richardson, C. Kozyrakis, M. Horowitz

{"title":"一个记忆系统设计框架:创建智能记忆","authors":"A. Firoozshahian, A. Solomatnikov, Ofer Shacham, Zain Asgar, S. Richardson, C. Kozyrakis, M. Horowitz","doi":"10.1145/1555754.1555805","DOIUrl":null,"url":null,"abstract":"As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.\n To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"65 1","pages":"406-417"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"A memory system design framework: creating smart memories\",\"authors\":\"A. Firoozshahian, A. Solomatnikov, Ofer Shacham, Zain Asgar, S. Richardson, C. Kozyrakis, M. Horowitz\",\"doi\":\"10.1145/1555754.1555805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.\\n To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.\",\"PeriodicalId\":91388,\"journal\":{\"name\":\"Proceedings. International Symposium on Computer Architecture\",\"volume\":\"65 1\",\"pages\":\"406-417\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1555754.1555805\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

随着CPU内核成为构建模块，我们看到了为cmp提出的片上存储系统类型的巨大扩展。不幸的是，设计缓存和协议控制器来支持这些内存系统是很复杂的，它们的并发性和延迟特性会显著影响任何CMP的性能。为了解决这个问题，本文提出了一个缓存和协议控制器的微架构框架，它可以帮助生成新的存储系统的RTL。该框架由三个流水线引擎(请求跟踪、状态操作和数据移动)组成，它们被编程为实现更高级别的内存模型。该方法通过将内存模型分解为状态和数据操作序列，简化了CMP系统的设计和验证。此外，实现框架本身产生了一个多态内存系统。为了验证该方法，我们在硅上实现了一个可扩展的、灵活的CMP。然后对内存系统进行编程，以支持三种不同的内存模型:缓存、一致共享内存、流和事务性内存。这种方法的测量开销似乎很有希望。与零内部延迟的理想控制器相比，我们的系统生成的控制器的性能开销低于20%。甚至直接实现一个完全可编程控制器的开销也是适度的。虽然它确实使控制器的面积增加了一倍，但系统的平摊有效面积增加了大约7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A memory system design framework: creating smart memories

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system. To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助