Coarse-Grained Floorplanning for streaming CNN applications on Multi-Die FPGAs

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC) Pub Date : 2022-07-01 DOI:10.1109/ISPDC55340.2022.00014

Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda

{"title":"Coarse-Grained Floorplanning for streaming CNN applications on Multi-Die FPGAs","authors":"Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda","doi":"10.1109/ISPDC55340.2022.00014","DOIUrl":null,"url":null,"abstract":"With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC55340.2022.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多模fpga流CNN应用的粗粒度平面规划

随着fpga在云端的广泛应用，有必要研究将CNN高效部署到多fpga云基础设施中的架构和机制。然而，神经网络的日益庞大和复杂，加上通信和片外存储器的瓶颈，使得多fpga设计越来越难以实现高资源利用率。在这项工作中，我们引入了一个可扩展的框架，该框架支持将CNN应用程序有效地集成到云基础设施中，从而向云开发人员公开多芯片fpga。我们的框架配备了两种机制来促进CNN推理在FPGA上的部署。首先，我们提出了一个模型来找到在资源预算内最大化并行性的参数，同时保持层之间的平衡速率。然后，我们提出了一种高效的粗粒度图划分算法，用于在fpga上放置CNN组件的高质量和可扩展的路由驱动。与相同数量的fpga的基线实现相比，原型结果总体上实现了37%的高频率，资源使用量更低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

自引率

0.00%

发文量

期刊最新文献

Estimating the Impact of Communication Schemes for Distributed Graph Processing Sponsors and Conference Support Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory [Full] Deep Heuristic for Broadcasting in Arbitrary Networks Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs