High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Internet of Things Journal Pub Date : 2025-02-19 DOI:10.1109/JIOT.2025.3543213
Ning Zhang;Shuo Ni;Liang Chen;Tong Wang;He Chen
{"title":"High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks","authors":"Ning Zhang;Shuo Ni;Liang Chen;Tong Wang;He Chen","doi":"10.1109/JIOT.2025.3543213","DOIUrl":null,"url":null,"abstract":"Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 12","pages":"20357-20376"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10896587/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于fpga的全加法器神经网络高通量高能效加速器
神经网络已广泛应用于各种物联网(IoT)应用,如无人机和卫星遥感以及自动驾驶。随着传感器捕获的数据分辨率和数据量的不断提高,物联网应用对实时响应的需求正在显著增加。然而,现有的用于现场可编程门阵列(FPGA)平台上物联网应用的卷积神经网络(CNN)加速器,由于CNN固有的密集乘法运算、内存带宽限制和低效的映射机制,难以实现高吞吐量。为了解决这一问题,本文提出了一种基于FPGA平台的物联网应用的高吞吐量、高能效全加法神经网络(A2NN)加速器。首先,提出了一系列面向硬件的算法优化方法,简化A2NN的处理流程,进一步降低部署开销。其次,提出了一种基于近内存计算(NMC)思想的新型硬件架构,完全消除了片外存储器访问,加速了流水线中的重构A2NN。第三,提出了一套定量分析方法,以平衡吞吐量和能耗,使加速器能够适应不同物联网应用场景的不同需求。在AMD-Xilinx VC709板上的大量实验结果表明,所提出的加速器在吞吐量、能源效率和吞吐量效率方面达到了最先进的性能。此外,在AMD-Xilinx KV260板上的实验表明,该架构具有卓越的可扩展性和能效,能够根据物联网应用场景的特定要求在速度和功耗之间实现平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Internet of Things Journal
IEEE Internet of Things Journal Computer Science-Information Systems
CiteScore
17.60
自引率
13.20%
发文量
1982
期刊介绍: The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.
期刊最新文献
Cross-Layer Task Scheduling for NOMA-Assisted Satellite Edge Computing RTT-LIO: A Wi-Fi RTT-aided LiDAR-Inertial Odometry via Tightly-Coupled Factor Graph Optimization in Complex Scenes IEEE Internet of Things Journal Information for Authors Multi-scale Anomaly Decomposition Graph Neural Network for High-Speed Rail Passenger Flow Forecasting SAPF: Spatial Ambiguity Aware Particle Filter for Robust Localization in Urban Radio Maps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1