Latency-Optimized Networks for Clustering FPGAs

2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2013-04-28 DOI:10.1109/FCCM.2013.49

Trevor Bunker, S. Swanson

{"title":"Latency-Optimized Networks for Clustering FPGAs","authors":"Trevor Bunker, S. Swanson","doi":"10.1109/FCCM.2013.49","DOIUrl":null,"url":null,"abstract":"The data-intensive applications that will shape computing in the coming decades require scalable architectures that incorporate scalable data and compute resources and can support random requests to unstructured (e.g., logs) and semi-structured (e.g., large graph, XML) data sets. To explore the suitability of FPGAs for these computations, we are constructing an FPGAbased system with a memory capacity of 512 GB from a collection of 32 Virtex-5 FPGAs spread across 8 enclosures. This paper describes our work in exploring alternative interconnect technologies and network topologies for FPGA-based clusters. The diverse interconnects combine inter-enclosure high-speed serial links and wide, single-ended intra-enclosure on-board traces with network topologies that balance network diameter, network throughput, and FPGA resource usage. We discuss the architecture of high-radix routers in FPGAs that optimize for the asymmetry between the interand intra-enclosure links. We analyze the various interconnects that aim to efficiently utilize the prototype's total switching capacity of 2.43 Tb/s. The networks we present have aggregate throughputs up to 51.4 GB/s for random traffic, diameters as low as 845 nanoseconds, and consume less than 12% of the FPGAs' logic resources.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2013.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

The data-intensive applications that will shape computing in the coming decades require scalable architectures that incorporate scalable data and compute resources and can support random requests to unstructured (e.g., logs) and semi-structured (e.g., large graph, XML) data sets. To explore the suitability of FPGAs for these computations, we are constructing an FPGAbased system with a memory capacity of 512 GB from a collection of 32 Virtex-5 FPGAs spread across 8 enclosures. This paper describes our work in exploring alternative interconnect technologies and network topologies for FPGA-based clusters. The diverse interconnects combine inter-enclosure high-speed serial links and wide, single-ended intra-enclosure on-board traces with network topologies that balance network diameter, network throughput, and FPGA resource usage. We discuss the architecture of high-radix routers in FPGAs that optimize for the asymmetry between the interand intra-enclosure links. We analyze the various interconnects that aim to efficiently utilize the prototype's total switching capacity of 2.43 Tb/s. The networks we present have aggregate throughputs up to 51.4 GB/s for random traffic, diameters as low as 845 nanoseconds, and consume less than 12% of the FPGAs' logic resources.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

集群fpga的延迟优化网络

将在未来几十年塑造计算的数据密集型应用程序需要可扩展的体系结构，这些体系结构包含可扩展的数据和计算资源，并且可以支持对非结构化(例如，日志)和半结构化(例如，大图、XML)数据集的随机请求。为了探索fpga对这些计算的适用性，我们正在构建一个基于fpga的系统，其内存容量为512 GB，由32个Virtex-5 fpga组成，分布在8个机箱中。本文描述了我们在探索基于fpga的集群的替代互连技术和网络拓扑方面的工作。多样化的互连结合了机箱间高速串行链路和宽单端机箱内板上走线，以及平衡网络直径、网络吞吐量和FPGA资源使用的网络拓扑结构。我们讨论了fpga中高基数路由器的结构，该结构优化了机箱间和机箱内链路之间的不对称性。我们分析了各种互连，旨在有效利用原型的总交换容量为2.43 Tb/s。我们提出的网络具有高达51.4 GB/s的随机流量的总吞吐量，直径低至845纳秒，并且消耗的fpga逻辑资源不到12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines

自引率

0.00%

发文量