FPGA based Adaptive Hardware Acceleration for Multiple Deep Learning Tasks

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI:10.1109/MCSoC51149.2021.00038

Yufan Lu, X. Zhai, S. Saha, Shoaib Ehsan, K. Mcdonald-Maier

{"title":"FPGA based Adaptive Hardware Acceleration for Multiple Deep Learning Tasks","authors":"Yufan Lu, X. Zhai, S. Saha, Shoaib Ehsan, K. Mcdonald-Maier","doi":"10.1109/MCSoC51149.2021.00038","DOIUrl":null,"url":null,"abstract":"Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于FPGA的多深度学习任务自适应硬件加速

机器学习，特别是深度学习(DL)，已经在各种各样的应用中取得了巨大的成功，例如物体检测、图像分类和自动驾驶。然而，由于硬件资源和功耗的限制，在资源受限的移动和嵌入式系统上部署深度学习算法存在许多挑战，特别是对于运行多种深度学习算法以执行各种任务的系统。本文提出了一种基于现场可编程门阵列(fpga)的自适应硬件资源管理系统，用于动态管理片上硬件资源(如lut、bram和dsp)，以适应各种任务。通过DFX (dynamic function exchange)技术，系统可以动态分配硬件资源来部署深度学习单元(dpu)，从而平衡深度学习应用的需求、性能和功耗。该原型在赛灵思Zynq UltraScale+系列芯片上实现。实验结果表明，在各种实验场景下，该方案显著提高了资源受限系统的计算效率。与基线相比，该策略在低工作负载和高工作负载情况下的功耗分别为38%和82%。通常情况下，该系统可以节省约75.8%的能源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量

期刊最新文献

A Distance Estimation Method to Railway Crossing Using Warning Signs FPGA-Based Implementation of the Stereo Matching Algorithm Using High-Level Synthesis A Low Cost and Portable Mini Motor Car System with a BNN Accelerator on FPGA Enhancing Autotuning Capability with a History Database UI Method to Support Knowledge Creation in Hybrid Museum Experience