High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Internet of Things Journal Pub Date : 2025-02-19 DOI:10.1109/JIOT.2025.3543213

Ning Zhang;Shuo Ni;Liang Chen;Tong Wang;He Chen

{"title":"High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks","authors":"Ning Zhang;Shuo Ni;Liang Chen;Tong Wang;He Chen","doi":"10.1109/JIOT.2025.3543213","DOIUrl":null,"url":null,"abstract":"Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 12","pages":"20357-20376"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10896587/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于fpga的全加法器神经网络高通量高能效加速器

神经网络已广泛应用于各种物联网（IoT）应用，如无人机和卫星遥感以及自动驾驶。随着传感器捕获的数据分辨率和数据量的不断提高，物联网应用对实时响应的需求正在显著增加。然而，现有的用于现场可编程门阵列（FPGA）平台上物联网应用的卷积神经网络（CNN）加速器，由于CNN固有的密集乘法运算、内存带宽限制和低效的映射机制，难以实现高吞吐量。为了解决这一问题，本文提出了一种基于FPGA平台的物联网应用的高吞吐量、高能效全加法神经网络（A2NN）加速器。首先，提出了一系列面向硬件的算法优化方法，简化A2NN的处理流程，进一步降低部署开销。其次，提出了一种基于近内存计算（NMC）思想的新型硬件架构，完全消除了片外存储器访问，加速了流水线中的重构A2NN。第三，提出了一套定量分析方法，以平衡吞吐量和能耗，使加速器能够适应不同物联网应用场景的不同需求。在AMD-Xilinx VC709板上的大量实验结果表明，所提出的加速器在吞吐量、能源效率和吞吐量效率方面达到了最先进的性能。此外，在AMD-Xilinx KV260板上的实验表明，该架构具有卓越的可扩展性和能效，能够根据物联网应用场景的特定要求在速度和功耗之间实现平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.