A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision

IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-17 DOI:10.1109/TVLSI.2024.3409649
Zikang Zhou;Xuyang Duan;Jun Han
{"title":"A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision","authors":"Zikang Zhou;Xuyang Duan;Jun Han","doi":"10.1109/TVLSI.2024.3409649","DOIUrl":null,"url":null,"abstract":"Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are \n<inline-formula> <tex-math>$1.11\\times $ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$3.37\\times $ </tex-math></inline-formula>\n more efficient than previous works.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10559268/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are $1.11\times $ and $3.37\times $ more efficient than previous works.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在 FPGA 上生成高能效加速器以实现低级视觉的设计框架
低级视觉算法在生物医学、安全和自动驾驶等广泛应用中发挥着越来越重要的作用。低级视觉加速器也得到了广泛的研究。由于低级视觉通常部署在嵌入式设备中,因此其加速器需要实现高能效。同时,低级视觉的广泛应用场景也促使其快速迭代。为快速演进的低级视觉算法设计高能效加速器需要投入大量精力。因此,我们迫切需要一个专门用于生成低级视觉加速器的设计框架。在本文中,我们在现场可编程门阵列(FPGA)上提出了一个端到端算法-硬件生成框架 EffiVision,旨在为低级视觉神经网络生成高能效的专用加速器。EffiVision 提出的硬件模板具有多种并行性和大型架构探索空间,专门设计用于适应低级视觉网络的特性。然后,它采用激活权值感知混合精度量化和 FPGA 感知 NNLUT,在硬件模板内搜索合适的硬件参数,生成专为低级视觉网络定制的高能效加速器。我们使用 EffiVision 在赛灵思 FPGA 开发板上为三个低级视觉神经网络快速超分辨率卷积神经网络 (FSRCNN)、去噪卷积神经网络 (DnCNN) 和去马赛克卷积神经网络 (DMCNN) 生成了硬件,分别实现了 174.9、97.8 和 92.7 GOPS/W 的最佳能效。生成的 FSRCNN 和 DnCNN 的加速器比以前的工作效率分别高出 1.11 美元和 3.37 美元。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.40
自引率
7.10%
发文量
187
审稿时长
3.6 months
期刊介绍: The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.
期刊最新文献
Sophon: A Time-Repeatable and Low-Latency Architecture for Embedded Real-Time Systems Based on RISC-V CR-DRAM: Improving DRAM Refresh Energy Efficiency With Inter-Subarray Charge Recycling A Novel TriNet Architecture for Enhanced Analog IC Design Automation A Two-Channel Interleaved ADC With Fast-Converging Foreground Time Calibration and Comparison-Based Control Logic A Post-Bond ILV Test Method in Monolithic 3-D ICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1