FastTrack:利用快速FPGA布线实现NoC捷径(仅摘要)

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174962

Nachiket Kapre, T. Krishna

{"title":"FastTrack:利用快速FPGA布线实现NoC捷径(仅摘要)","authors":"Nachiket Kapre, T. Krishna","doi":"10.1145/3174243.3174962","DOIUrl":null,"url":null,"abstract":"The latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline interconnects, thereby adding extra latency overhead to the NoC. The use of FPGA-friendly deflection routing schemes further exacerbates latency. Fortunately, FPGAs provide segmented interconnects with different lengths (speeds). Faster FPGA tracks can be used to reduce the number of switchbox hops along the packet path. We introduce FastTrack, an adaption to the NoC organization that inserts express bypass links in the NoC to skip multiple router stages in a single clock cycle. Our FastTrack design can be tuned to support different express link lengths for performance, and depopulation strategies for controlling cost. For the Xilinx Virtex-7 485T FPGA, an 8×8 FastTrack NoC is 2× larger than a base Hoplite NoC, but operates between 1.2-0.8× its clock frequency when using express links of length 2-4. FastTrack delivers throughput and latency improvements across a range of statistical workloads (2-2.5×), and traces extracted from FPGA accelerator case studies such as Sparse Matrix-Vector Multiplication (2.5×), Graph Analytics (2.8×), and Multi-processor overlay applications (2×). FastTrack also shows energy efficiency improvements by factors of up to 2× over baseline Hoplite due to higher sustained rates and high speed operation of express links made possible by fast FPGA interconnect.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only)\",\"authors\":\"Nachiket Kapre, T. Krishna\",\"doi\":\"10.1145/3174243.3174962\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline interconnects, thereby adding extra latency overhead to the NoC. The use of FPGA-friendly deflection routing schemes further exacerbates latency. Fortunately, FPGAs provide segmented interconnects with different lengths (speeds). Faster FPGA tracks can be used to reduce the number of switchbox hops along the packet path. We introduce FastTrack, an adaption to the NoC organization that inserts express bypass links in the NoC to skip multiple router stages in a single clock cycle. Our FastTrack design can be tuned to support different express link lengths for performance, and depopulation strategies for controlling cost. For the Xilinx Virtex-7 485T FPGA, an 8×8 FastTrack NoC is 2× larger than a base Hoplite NoC, but operates between 1.2-0.8× its clock frequency when using express links of length 2-4. FastTrack delivers throughput and latency improvements across a range of statistical workloads (2-2.5×), and traces extracted from FPGA accelerator case studies such as Sparse Matrix-Vector Multiplication (2.5×), Graph Analytics (2.8×), and Multi-processor overlay applications (2×). FastTrack also shows energy efficiency improvements by factors of up to 2× over baseline Hoplite due to higher sustained rates and high speed operation of express links made possible by fast FPGA interconnect.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174962\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分组交换的FPGA覆盖片上网络(NoC)的延迟随着NoC维度线性上升，因为数据包通常在每个动态路由器上沿路径花费一个周期。高性能FPGA NoC必须积极地进行管道互连，从而为NoC增加了额外的延迟开销。使用fpga友好的偏转路由方案进一步加剧了延迟。幸运的是，fpga提供了不同长度(速度)的分段互连。可以使用更快的FPGA轨道来减少分组路径上的开关箱跳数。我们介绍了FastTrack，这是对NoC组织的一种适应，它在NoC中插入快速旁路链路，以便在单个时钟周期内跳过多个路由器阶段。我们的快速轨道设计可以调整，以支持不同的快速链路长度的性能和减少人口的策略，以控制成本。对于Xilinx Virtex-7 485T FPGA, 8×8 FastTrack NoC比基础Hoplite NoC大2倍，但在使用长度为2-4的快速链路时，其时钟频率在1.2-0.8倍之间。FastTrack在一系列统计工作负载(2-2.5倍)中提供吞吐量和延迟改进，并从FPGA加速器案例研究中提取跟踪，如稀疏矩阵向量乘法(2.5倍)，图形分析(2.8倍)和多处理器覆盖应用(2x)。FastTrack还显示，由于快速FPGA互连实现了更高的持续速率和高速运行的快速链路，能效提高了基线Hoplite的2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only)

The latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline interconnects, thereby adding extra latency overhead to the NoC. The use of FPGA-friendly deflection routing schemes further exacerbates latency. Fortunately, FPGAs provide segmented interconnects with different lengths (speeds). Faster FPGA tracks can be used to reduce the number of switchbox hops along the packet path. We introduce FastTrack, an adaption to the NoC organization that inserts express bypass links in the NoC to skip multiple router stages in a single clock cycle. Our FastTrack design can be tuned to support different express link lengths for performance, and depopulation strategies for controlling cost. For the Xilinx Virtex-7 485T FPGA, an 8×8 FastTrack NoC is 2× larger than a base Hoplite NoC, but operates between 1.2-0.8× its clock frequency when using express links of length 2-4. FastTrack delivers throughput and latency improvements across a range of statistical workloads (2-2.5×), and traces extracted from FPGA accelerator case studies such as Sparse Matrix-Vector Multiplication (2.5×), Graph Analytics (2.8×), and Multi-processor overlay applications (2×). FastTrack also shows energy efficiency improvements by factors of up to 2× over baseline Hoplite due to higher sustained rates and high speed operation of express links made possible by fast FPGA interconnect.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助