Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster

2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID) Pub Date : 2021-10-29 DOI:10.1109/asid52932.2021.9651719

Yanbu Hu, C. Shao, Huiyun Li

{"title":"Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster","authors":"Yanbu Hu, C. Shao, Huiyun Li","doi":"10.1109/asid52932.2021.9651719","DOIUrl":null,"url":null,"abstract":"In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.","PeriodicalId":150884,"journal":{"name":"2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/asid52932.2021.9651719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于可扩展异构FPGA集群的节能深度神经网络实现

近年来，随着深度神经网络的快速发展，计算机视觉、自然语言处理等一系列领域的算法复杂度迅速增加。与GPU等高性能器件相比，基于fpga的DNN加速器具有更高的能效，具有优越的灵活性和性能。然而，单个FPGA的计算资源有限，难以灵活满足不同计算规模的高吞吐量和高能效要求。因此，本文提出了一种基于可扩展异构FPGA集群的深度神经网络实现方法，以适应不同的任务，实现高吞吐量和高能效。首先，该方法将单个庞大的任务划分为多个模块，并将每个模块作为多板之间的流水线结构在不同的FPGA上运行。其次，提出了一种基于二分法的任务部署方法，最大限度地平衡不同管道阶段的任务执行时间，以提高吞吐量和能源效率;第三，根据计算能力与带宽的关系对DNN计算模块进行优化，通过减少无效资源的浪费，提高资源利用率来提高能源效率。在Alexnet和VGG-16上的实验结果表明，我们使用Zynq 7035集群最多可以达到×25.23优化后的AMD AIO处理器的能效。与以往单FPGA和FPGA集群的工作相比，能效分别提高59.5%和18.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)

自引率

0.00%

发文量

期刊最新文献

An Approximate Adder Design Based on Inexact Full Adders A Single Event Effect Simulation Method for RISC-V Processor A Precise 3D Positioning Approach Based on UWB with Reduced Base Stations Digital Decimation Filter Design for a 3rd-Order Sigma-Delta Modulator with Achieving 129 dB SNR VLSI Architecture Design for Adder Convolution Neural Network Accelerator