A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-17 DOI:10.1109/TVLSI.2024.3409649

Zikang Zhou;Xuyang Duan;Jun Han

{"title":"A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision","authors":"Zikang Zhou;Xuyang Duan;Jun Han","doi":"10.1109/TVLSI.2024.3409649","DOIUrl":null,"url":null,"abstract":"Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are \n<inline-formula> <tex-math>$1.11\\times $ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$3.37\\times $ </tex-math></inline-formula>\n more efficient than previous works.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 8","pages":"1485-1497"},"PeriodicalIF":3.1000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10559268/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are

$1.11\times $

and

$3.37\times $

more efficient than previous works.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在 FPGA 上生成高能效加速器以实现低级视觉的设计框架

低级视觉算法在生物医学、安全和自动驾驶等广泛应用中发挥着越来越重要的作用。低级视觉加速器也得到了广泛的研究。由于低级视觉通常部署在嵌入式设备中，因此其加速器需要实现高能效。同时，低级视觉的广泛应用场景也促使其快速迭代。为快速演进的低级视觉算法设计高能效加速器需要投入大量精力。因此，我们迫切需要一个专门用于生成低级视觉加速器的设计框架。在本文中，我们在现场可编程门阵列（FPGA）上提出了一个端到端算法-硬件生成框架 EffiVision，旨在为低级视觉神经网络生成高能效的专用加速器。EffiVision 提出的硬件模板具有多种并行性和大型架构探索空间，专门设计用于适应低级视觉网络的特性。然后，它采用激活权值感知混合精度量化和 FPGA 感知 NNLUT，在硬件模板内搜索合适的硬件参数，生成专为低级视觉网络定制的高能效加速器。我们使用 EffiVision 在赛灵思 FPGA 开发板上为三个低级视觉神经网络快速超分辨率卷积神经网络 (FSRCNN)、去噪卷积神经网络 (DnCNN) 和去马赛克卷积神经网络 (DMCNN) 生成了硬件，分别实现了 174.9、97.8 和 92.7 GOPS/W 的最佳能效。生成的 FSRCNN 和 DnCNN 的加速器比以前的工作效率分别高出 1.11 美元和 3.37 美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.