基于片上网络虚拟输出排队的低成本单周期路由器

S. Nguyen, S. Oyanagi
{"title":"基于片上网络虚拟输出排队的低成本单周期路由器","authors":"S. Nguyen, S. Oyanagi","doi":"10.1109/DSD.2010.15","DOIUrl":null,"url":null,"abstract":"The communication latency of Network-on-Chip (NoC) is one of the factors that significantly impacts on the application performance on System-on-Chips. To reduce the NoC latency, we propose a low latency architecture of router, which utilizes virtual output queuing (VOQ) to shorten the processing time of a packet transfer. Based on taking advantage of VOQ in buffering, the number of pipeline stages of a packet transfer can be reduced to two stages of switch allocation and switch traversal. By speculatively implementing these stages in a parallel fashion, the router can perform a packet transfer in only one clock cycle. In addition, a multiple VOQ architecture that each input port maintains more than one queue for each output channel is also proposed for improving the throughput of router. We have implemented the proposed router on FPGA and evaluated in terms of communication latency, throughput and hardware amount. The experimental results show that in a 4x4 two-dimensional mesh network, the proposed router reduces the communication latency by 25% and cost of area by 67.3% as compared to the look-ahead speculative virtual channel router.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A Low Cost Single-Cycle Router Based on Virtual Output Queuing for On-chip Networks\",\"authors\":\"S. Nguyen, S. Oyanagi\",\"doi\":\"10.1109/DSD.2010.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The communication latency of Network-on-Chip (NoC) is one of the factors that significantly impacts on the application performance on System-on-Chips. To reduce the NoC latency, we propose a low latency architecture of router, which utilizes virtual output queuing (VOQ) to shorten the processing time of a packet transfer. Based on taking advantage of VOQ in buffering, the number of pipeline stages of a packet transfer can be reduced to two stages of switch allocation and switch traversal. By speculatively implementing these stages in a parallel fashion, the router can perform a packet transfer in only one clock cycle. In addition, a multiple VOQ architecture that each input port maintains more than one queue for each output channel is also proposed for improving the throughput of router. We have implemented the proposed router on FPGA and evaluated in terms of communication latency, throughput and hardware amount. The experimental results show that in a 4x4 two-dimensional mesh network, the proposed router reduces the communication latency by 25% and cost of area by 67.3% as compared to the look-ahead speculative virtual channel router.\",\"PeriodicalId\":356885,\"journal\":{\"name\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2010.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

片上网络(NoC)的通信延迟是影响片上系统应用性能的重要因素之一。为了减少NoC延迟,我们提出了一种低延迟路由器架构,该架构利用虚拟输出排队(VOQ)来缩短数据包传输的处理时间。在利用VOQ缓冲的基础上,可以将数据包传输的管道阶段数减少到交换机分配和交换机遍历两个阶段。通过推测地以并行方式实现这些阶段,路由器可以仅在一个时钟周期内执行数据包传输。此外,为了提高路由器的吞吐量,还提出了一种每个输入端口为每个输出通道维护多个队列的多VOQ架构。我们已经在FPGA上实现了所提出的路由器,并在通信延迟、吞吐量和硬件数量方面进行了评估。实验结果表明,在4x4二维网状网络中,与前瞻性推测型虚拟通道路由器相比,该路由器的通信延迟降低了25%,面积成本降低了67.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Low Cost Single-Cycle Router Based on Virtual Output Queuing for On-chip Networks
The communication latency of Network-on-Chip (NoC) is one of the factors that significantly impacts on the application performance on System-on-Chips. To reduce the NoC latency, we propose a low latency architecture of router, which utilizes virtual output queuing (VOQ) to shorten the processing time of a packet transfer. Based on taking advantage of VOQ in buffering, the number of pipeline stages of a packet transfer can be reduced to two stages of switch allocation and switch traversal. By speculatively implementing these stages in a parallel fashion, the router can perform a packet transfer in only one clock cycle. In addition, a multiple VOQ architecture that each input port maintains more than one queue for each output channel is also proposed for improving the throughput of router. We have implemented the proposed router on FPGA and evaluated in terms of communication latency, throughput and hardware amount. The experimental results show that in a 4x4 two-dimensional mesh network, the proposed router reduces the communication latency by 25% and cost of area by 67.3% as compared to the look-ahead speculative virtual channel router.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multicore SDR Architecture for Reconfigurable WiMAX Downlink Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit Low Latency Recovery from Transient Faults for Pipelined Processor Architectures System Level Hardening by Computing with Matrices Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1