在击中cmp的记忆墙之前组织最后一道防线

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI:10.1109/HPCA.2004.10017

Chun Liu, A. Sivasubramaniam, M. Kandemir

{"title":"在击中cmp的记忆墙之前组织最后一道防线","authors":"Chun Liu, A. Sivasubramaniam, M. Kandemir","doi":"10.1109/HPCA.2004.10017","DOIUrl":null,"url":null,"abstract":"The last line of defense in the cache hierarchy before going to off-chip memory is very critical in chip multiprocessors (CMPs) from both the performance and power perspectives. We investigate different organizations for this last line of defense (assumed to be L2 in this article) towards reducing off-chip memory accesses. We evaluate the trade-offs between private L2 and address-interleaved shared L2 designs, noting their individual benefits and drawbacks. The possible imbalance between the L2 demands across the CPUs favors a shared L2 organization, while the interference between these demands can favor a private L2 organization. We propose a new architecture, called Shared Processor-Based Split L2, that captures the benefits of these two organizations, while avoiding many of their drawbacks. Using several applications from the SPEC OMP suite and a commercial benchmark, Specjbb, on a complete system simulator, we demonstrate the benefits of this shared processor-based L2 organization. Our results show as much as 42.50% improvement in IPC over the private organization (with 11.52% on the average), and as much as 42.22% improvement over the shared interleaved organization (with 9.76% on the average).","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"183","resultStr":"{\"title\":\"Organizing the last line of defense before hitting the memory wall for CMPs\",\"authors\":\"Chun Liu, A. Sivasubramaniam, M. Kandemir\",\"doi\":\"10.1109/HPCA.2004.10017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The last line of defense in the cache hierarchy before going to off-chip memory is very critical in chip multiprocessors (CMPs) from both the performance and power perspectives. We investigate different organizations for this last line of defense (assumed to be L2 in this article) towards reducing off-chip memory accesses. We evaluate the trade-offs between private L2 and address-interleaved shared L2 designs, noting their individual benefits and drawbacks. The possible imbalance between the L2 demands across the CPUs favors a shared L2 organization, while the interference between these demands can favor a private L2 organization. We propose a new architecture, called Shared Processor-Based Split L2, that captures the benefits of these two organizations, while avoiding many of their drawbacks. Using several applications from the SPEC OMP suite and a commercial benchmark, Specjbb, on a complete system simulator, we demonstrate the benefits of this shared processor-based L2 organization. Our results show as much as 42.50% improvement in IPC over the private organization (with 11.52% on the average), and as much as 42.22% improvement over the shared interleaved organization (with 9.76% on the average).\",\"PeriodicalId\":145009,\"journal\":{\"name\":\"10th International Symposium on High Performance Computer Architecture (HPCA'04)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"183\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th International Symposium on High Performance Computer Architecture (HPCA'04)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2004.10017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2004.10017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 183

摘要

在进入片外存储器之前，缓存层次结构中的最后一道防线在芯片多处理器(cmp)中从性能和功耗的角度来看都是非常关键的。为了减少片外内存访问，我们研究了不同组织的最后一道防线(本文中假定为L2)。我们评估了私有L2和地址交错共享L2设计之间的权衡，注意到它们各自的优点和缺点。跨cpu的L2需求之间可能存在的不平衡有利于共享的L2组织，而这些需求之间的干扰有利于私有的L2组织。我们提出了一种新的架构，称为基于共享处理器的分割L2，它抓住了这两种组织的优点，同时避免了它们的许多缺点。通过在一个完整的系统模拟器上使用SPEC OMP套件中的几个应用程序和一个商业基准测试Specjbb，我们演示了这种基于共享处理器的L2组织的好处。我们的结果显示，IPC比私有组织提高了42.50%(平均为11.52%)，比共享的交错组织提高了42.22%(平均为9.76%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Organizing the last line of defense before hitting the memory wall for CMPs

The last line of defense in the cache hierarchy before going to off-chip memory is very critical in chip multiprocessors (CMPs) from both the performance and power perspectives. We investigate different organizations for this last line of defense (assumed to be L2 in this article) towards reducing off-chip memory accesses. We evaluate the trade-offs between private L2 and address-interleaved shared L2 designs, noting their individual benefits and drawbacks. The possible imbalance between the L2 demands across the CPUs favors a shared L2 organization, while the interference between these demands can favor a private L2 organization. We propose a new architecture, called Shared Processor-Based Split L2, that captures the benefits of these two organizations, while avoiding many of their drawbacks. Using several applications from the SPEC OMP suite and a commercial benchmark, Specjbb, on a complete system simulator, we demonstrate the benefits of this shared processor-based L2 organization. Our results show as much as 42.50% improvement in IPC over the private organization (with 11.52% on the average), and as much as 42.22% improvement over the shared interleaved organization (with 9.76% on the average).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

10th International Symposium on High Performance Computer Architecture (HPCA'04)

自引率

0.00%

发文量