{"title":"High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks","authors":"Ning Zhang;Shuo Ni;Liang Chen;Tong Wang;He Chen","doi":"10.1109/JIOT.2025.3543213","DOIUrl":null,"url":null,"abstract":"Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 12","pages":"20357-20376"},"PeriodicalIF":8.9000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10896587/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Neural networks have been extensively applied across various Internet of Things (IoT) applications, such as drone- and satellite-based remote sensing and autonomous driving. With the increasing resolution and amount of data captured by sensors, the demand for real-time response in IoT applications is markedly increasing. However, it is difficult for existing convolutional neural network (CNN) accelerators for IoT applications on field-programmable gate array (FPGA) platforms to achieve high throughput because of the inherent dense multiplication operations of CNNs, memory bandwidth limitations and inefficient mapping mechanisms. In this article, a high-throughput and energy-efficient all adder neural network (A2NN) accelerator for IoT applications on FPGA platform is proposed to solve this problem. First, a series of hardware-oriented algorithm optimization methods are proposed to simplify the processing flow of A2NN and further minimize its deployment overhead. Second, a novel hardware architecture based on the idea of near-memory computation (NMC) is proposed to eliminate off-chip memory access completely and accelerate the reconstructed A2NN in the pipeline. Third, a set of quantitative analysis methods for the proposed accelerator is presented to balance throughput and energy consumption, allowing the accelerator to adapt to the varying demands of different IoT application scenarios. Extensive experimental results on the AMD-Xilinx VC709 board demonstrate that the proposed accelerator achieves state-of-the-art performance in terms of throughput, energy efficiency, and throughput efficiency. Moreover, experiments on the AMD-Xilinx KV260 board highlight the architecture’s exceptional scalability and energy efficiency, enabling a balance between speed and power consumption tailored to the specific requirements of IoT application scenarios.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.