A heterogeneous low-cost and low-latency Ring-Chain network for GPGPUs

Xia Zhao, Sheng Ma, Chen Li, L. Eeckhout, Zhiying Wang
{"title":"A heterogeneous low-cost and low-latency Ring-Chain network for GPGPUs","authors":"Xia Zhao, Sheng Ma, Chen Li, L. Eeckhout, Zhiying Wang","doi":"10.1109/ICCD.2016.7753329","DOIUrl":null,"url":null,"abstract":"To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Processing Units (GPGPUs) increases continuously. The communication demand of these cores boosts the demand for a low-latency packet switched network. As packet latency is mainly composed of per-hop latency, contention latency and serialization latency, a favorable Network-on-Chip (NoC) design should efficiently decrease these three latency contributors to meet the communication demand while keeping hardware cost low. In this paper, we first make two observations about the NoC differences between CMPs and GPGPUs, and then design a Heterogeneous Ring-Chain network (HRCnet) for the GPGPU reply network. HRCnet eliminates conflicts in the network by proposing a ring-similar topology, using a novel node placement and introducing unidirectional channels. Eliminating conflicts reduces the per-hop latency and removes the contention latency, and exploiting the ring-similar topology reduces the serialization latency. Experimental results show the benefits of the low-cost low-latency design. With the same bisection bandwidth compared to the baseline mesh, our work yields a 45% performance improvement while reducing the area by 42% and reducing energy consumption by 60%. Compared to two state-of-the-art GPGPU NoCs, BENoC and DA2mesh, HRCnet achieves more than 42% performance gain at reduced hardware cost. Our work also achieves the highest power and area efficiency among the designs.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 34th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2016.7753329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

To achieve high throughput, core count in compute accelerators such as General-Purpose Graphics Processing Units (GPGPUs) increases continuously. The communication demand of these cores boosts the demand for a low-latency packet switched network. As packet latency is mainly composed of per-hop latency, contention latency and serialization latency, a favorable Network-on-Chip (NoC) design should efficiently decrease these three latency contributors to meet the communication demand while keeping hardware cost low. In this paper, we first make two observations about the NoC differences between CMPs and GPGPUs, and then design a Heterogeneous Ring-Chain network (HRCnet) for the GPGPU reply network. HRCnet eliminates conflicts in the network by proposing a ring-similar topology, using a novel node placement and introducing unidirectional channels. Eliminating conflicts reduces the per-hop latency and removes the contention latency, and exploiting the ring-similar topology reduces the serialization latency. Experimental results show the benefits of the low-cost low-latency design. With the same bisection bandwidth compared to the baseline mesh, our work yields a 45% performance improvement while reducing the area by 42% and reducing energy consumption by 60%. Compared to two state-of-the-art GPGPU NoCs, BENoC and DA2mesh, HRCnet achieves more than 42% performance gain at reduced hardware cost. Our work also achieves the highest power and area efficiency among the designs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向gpgpu的异构低成本、低时延环链网络
为了实现高吞吐量,gpgpu (General-Purpose Graphics Processing Units)等计算加速器的核数不断增加。这些核心的通信需求提高了对低延迟分组交换网络的需求。由于数据包延迟主要由每跳延迟、争用延迟和序列化延迟组成,一个良好的片上网络(NoC)设计应该有效地降低这三个延迟因素,以满足通信需求,同时保持较低的硬件成本。本文首先观察了cmp和GPGPU之间的NoC差异,然后设计了一个异构环链网络(HRCnet)用于GPGPU应答网络。HRCnet通过提出环形相似拓扑、使用新颖的节点布局和引入单向通道来消除网络中的冲突。消除冲突可以减少每跳延迟并消除争用延迟,利用类似环的拓扑可以减少序列化延迟。实验结果表明了低成本、低延迟设计的优点。与基线网格相比,在相同的平分带宽下,我们的工作产生了45%的性能提升,同时减少了42%的面积,降低了60%的能耗。与两种最先进的GPGPU noc (BENoC和DA2mesh)相比,HRCnet在降低硬件成本的同时实现了42%以上的性能提升。我们的工作也实现了最高的功率和面积效率的设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks VARIUS-TC: A modular architecture-level model of parametric variation for thin-channel switches A readback based general debugging framework for soft-core processors How logic masking can improve path delay analysis for Hardware Trojan detection ONAC: Optimal number of active cores detector for energy efficient GPU computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1