A low-power network-on-chip architecture for tile-based chip multi-processors

2016 International Great Lakes Symposium on VLSI (GLSVLSI) Pub Date : 2016-05-18 DOI:10.1145/2902961.2903010

Anastasios Psarras, Junghee Lee, Pavlos M. Mattheakis, C. Nicopoulos, G. Dimitrakopoulos

{"title":"A low-power network-on-chip architecture for tile-based chip multi-processors","authors":"Anastasios Psarras, Junghee Lee, Pavlos M. Mattheakis, C. Nicopoulos, G. Dimitrakopoulos","doi":"10.1145/2902961.2903010","DOIUrl":null,"url":null,"abstract":"Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2902961.2903010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Technology scaling of tiled-based CMPs reduces the physical size of each tile and increases the number of tiles per die. This trend directly impacts the on-chip interconnect; even though the tile population increases, the inter-tile link distances scale down proportionally to the tile dimensions. The decreasing inter-tile wire lengths can be exploited to enable swift link traversal between neighboring tiles, after appropriate wire engineering. Building on this premise, we propose a technique to rapidly transfer flits between adjacent routers in half a clock cycle, by utilizing both edges of the clock during the sending and receiving operations. Half-cycle link traversal enables, for the first time, substantial reductions in (a) link power, irrespective of the data switching profile, and (b) buffer power (through buffer-size reduction), without incurring any latency/throughput loss. In fact, the proposed architecture also yields some latency improvements over a baseline NoC. Detailed hardware analysis using placed-and-routed designs, and cycle-accurate full-system simulations corroborate the significant power and latency improvements.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种低功耗的片上网络架构，用于基于磁片的芯片多处理器

基于瓷砖的cmp的技术缩放减少了每个瓷砖的物理尺寸，并增加了每个骰子的瓷砖数量。这一趋势直接影响到片上互连;即使瓦片数量增加，瓦片间的连接距离也会随着瓦片的尺寸成比例地缩小。在适当的导线工程之后，可以利用减少的瓦间导线长度来实现相邻瓦之间的快速链路遍历。在此前提下，我们提出了一种技术，通过在发送和接收操作期间利用时钟的两个边缘，在半个时钟周期内在相邻路由器之间快速传输flits。半周期链路遍历首次实现了以下两方面的大幅降低:(a)链路功率，与数据交换配置文件无关;(b)缓冲区功率(通过减少缓冲区大小)，而不会导致任何延迟/吞吐量损失。实际上，与基准NoC相比，所建议的体系结构还能产生一些延迟改进。使用放置和路由设计的详细硬件分析以及周期精确的全系统模拟证实了显著的功耗和延迟改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 International Great Lakes Symposium on VLSI (GLSVLSI)

自引率

0.00%

发文量

期刊最新文献

Concurrent error detection for reliable SHA-3 design Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs Multiple attempt write strategy for low energy STT-RAM An enhanced analytical electrical masking model for multiple event transients A novel on-chip impedance calibration method for LPDDR4 interface between DRAM and AP/SoC