Sudhir K. Satpathy, Korey Sewell, Thomas Manville, Yen-Po Chen, R. Dreslinski, D. Sylvester, T. Mudge, D. Blaauw
{"title":"A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS","authors":"Sudhir K. Satpathy, Korey Sewell, Thomas Manville, Yen-Po Chen, R. Dreslinski, D. Sylvester, T. Mudge, D. Blaauw","doi":"10.1109/ISSCC.2012.6177098","DOIUrl":null,"url":null,"abstract":"High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.","PeriodicalId":255282,"journal":{"name":"2012 IEEE International Solid-State Circuits Conference","volume":"398 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Solid-State Circuits Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2012.6177098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.
高速和低功耗路由器构成片上互连结构的基本组成部分,对高性能系统的整体吞吐量和能源效率至关重要。传统路由器使用不同的逻辑块来路由数据和处理仲裁。在更高的基数下,这些块之间的连接成为瓶颈,限制了路由器的可扩展性并降低了性能。最近,有两种交换机拓扑将数据路由结构与仲裁控制合并,从而避免了这一瓶颈。然而,它依赖于通道分配的集中控制,限制了性能,同时限制了一小部分固定优先级,使得输入端口容易出现饥饿。此外,更大的cmp将需要比以前的设计持续增加带宽。为了解决这些问题,我们提出了一个带有128b数据总线(总共8192根输入/输出线)的64x64单级搅拌开关网络(SSN)。SSN可以将任何输入连接到任何输出,包括组播。它在25°C下,在45nm SOI CMOS中,在1.1V下的峰值测量吞吐量为4.5Tb/s。SSN的主要特点是:1)单周期最近最少授予(LRG)优先仲裁技术,重用已经存在的输入和输出数据总线及其驱动器和感测放大器;2)额外的4级基于消息的服务质量(QoS)优先级仲裁,具有2%的逻辑和3%的布线开销;3)一个双向位线中继器,允许路由器扩展到8000根线。这些特性导致结构紧凑(4.06mm2),在3.4Tb/s/W效率下吞吐量增益为2.1倍,在600mV效率下提高到7.4Tb/s/W。