Cygnus - World First Multihybrid Accelerated Cluster with GPU and FPGA Coupling

T. Boku, N. Fujita, Ryohei Kobayashi, O. Tatebe
{"title":"Cygnus - World First Multihybrid Accelerated Cluster with GPU and FPGA Coupling","authors":"T. Boku, N. Fujita, Ryohei Kobayashi, O. Tatebe","doi":"10.1145/3547276.3548629","DOIUrl":null,"url":null,"abstract":"In this paper, we describe the concept, system architecture, supporting system software, and applications on our world-first supercomputer with multihybrid accelerators using GPU and FPGA coupling, named Cygnus, which runs at Center for Computational Sciences, University of Tsukuba. A special group of 32 nodes is configured as a multihybrid accelerated computing system named Albireo part although Cygnus is constructed with over 80 computation nodes as a GPU-accelerated PC cluster. Each node of the Albireo part is equipped with four NVIDIA V100 GPU cards and two Intel Stratix10 FPGA cards in addition to two sockets of Intel Xeon Gold CPU where all nodes are connected by four lanes of InfiniBand HDR100 interconnection HCA in the full bisection bandwidth of NVIDIA HDR200 switches. Beside this ordinary interconnection network, all FPGA cards in Albireo part are connected by a special 2-Dimensional Torus network with direct optical links on each FPGA for constructing a very high throughput and low latency of FPGA-centric interconnection network. To the best of our knowledge, Cygnus is the world’s first production-level PC cluster to realize multihybrid acceleration with the GPU and FPGA combination. Unlike other GPU-accelerated clusters, users can program parallel codes where each process exploits both or either of the GPU and/or FPGA devices based on the characteristics of their applications. We developed various supporting system software such as inter-FPGA network routing system, DMA engine for GPU-FPGA direct communication managed by FPGA, and multihybrid accelerated programming framework because the programming method of such a complicated system has not been standardized. Further, we developed the first real application on Cygnus for fundamental astrophysics simulation to fully utilize GPU and FPGA together for very efficient acceleration. We describe the overall concept and construction of the Cygnus cluster with a brief introduction of the several underlying hardware and software research studies that have already been published. We summarize how such a concept of GPU/FPGA coworking will usher in a new era of accelerated supercomputing.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, we describe the concept, system architecture, supporting system software, and applications on our world-first supercomputer with multihybrid accelerators using GPU and FPGA coupling, named Cygnus, which runs at Center for Computational Sciences, University of Tsukuba. A special group of 32 nodes is configured as a multihybrid accelerated computing system named Albireo part although Cygnus is constructed with over 80 computation nodes as a GPU-accelerated PC cluster. Each node of the Albireo part is equipped with four NVIDIA V100 GPU cards and two Intel Stratix10 FPGA cards in addition to two sockets of Intel Xeon Gold CPU where all nodes are connected by four lanes of InfiniBand HDR100 interconnection HCA in the full bisection bandwidth of NVIDIA HDR200 switches. Beside this ordinary interconnection network, all FPGA cards in Albireo part are connected by a special 2-Dimensional Torus network with direct optical links on each FPGA for constructing a very high throughput and low latency of FPGA-centric interconnection network. To the best of our knowledge, Cygnus is the world’s first production-level PC cluster to realize multihybrid acceleration with the GPU and FPGA combination. Unlike other GPU-accelerated clusters, users can program parallel codes where each process exploits both or either of the GPU and/or FPGA devices based on the characteristics of their applications. We developed various supporting system software such as inter-FPGA network routing system, DMA engine for GPU-FPGA direct communication managed by FPGA, and multihybrid accelerated programming framework because the programming method of such a complicated system has not been standardized. Further, we developed the first real application on Cygnus for fundamental astrophysics simulation to fully utilize GPU and FPGA together for very efficient acceleration. We describe the overall concept and construction of the Cygnus cluster with a brief introduction of the several underlying hardware and software research studies that have already been published. We summarize how such a concept of GPU/FPGA coworking will usher in a new era of accelerated supercomputing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cygnus -世界上第一个GPU和FPGA耦合的多混合加速集群
在本文中,我们描述了概念,系统架构,支持系统软件,以及在我们的世界上第一台使用GPU和FPGA耦合的多混合加速器的超级计算机Cygnus上的应用,该计算机运行在筑波大学计算科学中心。一个特殊的32个节点组被配置为一个名为Albireo部分的多混合加速计算系统,而Cygnus是一个由80多个计算节点组成的gpu加速PC集群。Albireo部件的每个节点配备4张NVIDIA V100 GPU卡和2张Intel Stratix10 FPGA卡,外加2个Intel Xeon Gold CPU插槽,所有节点之间通过InfiniBand HDR100互连HCA的4条通道连接在NVIDIA HDR200交换机的全等分带宽上。除了这个普通的互连网络,Albireo部分的所有FPGA卡都通过一个特殊的二维环面网络连接,每个FPGA上都有直接的光链路,以构建一个非常高的吞吐量和低延迟的以FPGA为中心的互连网络。据我们所知,Cygnus是世界上第一个通过GPU和FPGA组合实现多混合加速的生产级PC集群。与其他GPU加速集群不同,用户可以编写并行代码,其中每个进程根据其应用程序的特性同时或其中一个利用GPU和/或FPGA设备。由于这种复杂系统的编程方法尚未标准化,我们开发了多种支持系统软件,如FPGA间网络路由系统、FPGA管理的GPU-FPGA直接通信的DMA引擎、多混合加速编程框架等。此外,我们在Cygnus上开发了第一个用于基础天体物理模拟的实际应用程序,以充分利用GPU和FPGA一起实现非常高效的加速。我们描述了天鹅座星团的整体概念和结构,并简要介绍了已经发表的几个基础硬件和软件研究。我们总结了GPU/FPGA协同工作的概念将如何引领加速超级计算的新时代。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA A Fast and Secure AKA Protocol for B5G Execution Flow Aware Profiling for ROS-based Autonomous Vehicle Software A User-Based Bike Return Algorithm for Docked Bike Sharing Systems Extracting High Definition Map Information from Aerial Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1