采用新颖的65nm CMOS开关分配器的4.6Tbits/s 3.6GHz单周期NoC路由器

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI:10.1109/ICCD.2007.4601881

A. Kumary, Partha Kunduz, A.P. Singhx, Li-Shiuan Pehy, N. K. Jhay

{"title":"采用新颖的65nm CMOS开关分配器的4.6Tbits/s 3.6GHz单周期NoC路由器","authors":"A. Kumary, Partha Kunduz, A.P. Singhx, Li-Shiuan Pehy, N. K. Jhay","doi":"10.1109/ICCD.2007.4601881","DOIUrl":null,"url":null,"abstract":"As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65 nm technology. Our design targets an aggressive clock frequency of 3.6 GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6 GHz clock frequency while achieving apeak switching data rate in excess of 4.6 Tbits/sper router node.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"163 1","pages":"63-70"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"212","resultStr":"{\"title\":\"A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS\",\"authors\":\"A. Kumary, Partha Kunduz, A.P. Singhx, Li-Shiuan Pehy, N. K. Jhay\",\"doi\":\"10.1109/ICCD.2007.4601881\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65 nm technology. Our design targets an aggressive clock frequency of 3.6 GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6 GHz clock frequency while achieving apeak switching data rate in excess of 4.6 Tbits/sper router node.\",\"PeriodicalId\":6306,\"journal\":{\"name\":\"2007 25th International Conference on Computer Design\",\"volume\":\"163 1\",\"pages\":\"63-70\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"212\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 25th International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2007.4601881\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 212

摘要

随着芯片多处理器(cmp)成为扩展和利用当前微处理器中可用的大量晶体管的唯一可行方法，片上网络的设计变得至关重要。这些网络面临着独特的设计限制，需要提供极快和高带宽的通信，同时满足紧张的功率和面积预算。在本文中，我们提出了针对65纳米技术的36核共享内存CMP系统的片上网络路由器的详细设计。我们的设计目标是3.6 GHz的激进时钟频率，因此提出了严峻的设计挑战，导致了一些独特的电路和微架构创新和设计选择，包括新颖的高吞吐量和低延迟交换机分配机制，非投机单周期路由器管道，使用先进的束来消除控制设置开销，一种低复杂度的虚拟通道分配器和一种动态管理的共享缓冲区设计，该设计使用预取来最小化关键路径延迟。我们的路由器占用1.19 mm2的面积，在10%的活动下消耗551 mW的功率，在3.6 GHz时钟频率下提供单周期空载延迟，同时实现超过4.6 Tbits/ per路由器节点的峰值交换数据速率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS

As chip multiprocessors (CMPs) become the only viable way to scale up and utilize the abundant transistors made available in current microprocessors, the design of on-chip networks is becoming critically important. These networks face unique design constraints and are required to provide extremely fast and high bandwidth communication, yet meet tight power and area budgets. In this paper, we present a detailed design of our on-chip network router targeted at a 36-core shared-memory CMP system in 65 nm technology. Our design targets an aggressive clock frequency of 3.6 GHz, thus posing tough design challenges that led to several unique circuit and microarchitectural innovations and design choices, including a novel high throughput and low latency switch allocation mechanism, a non-speculative single-cycle router pipeline which uses advanced bundles to remove control setup overhead, a low-complexity virtual channel allocator and a dynamically-managed shared buffer design which uses prefetching to minimize critical path delay. Our router takes up 1.19 mm2 area and expends 551 mW power at 10% activity, delivering a single-cycle no-load latency at 3.6 GHz clock frequency while achieving apeak switching data rate in excess of 4.6 Tbits/sper router node.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 25th International Conference on Computer Design

自引率

0.00%

发文量