Yingzhao Shao, Jincheng Shang, Yunsong Li, Yueli Ding, Mingming Zhang, Ke Ren, Yang Liu
{"title":"FPGA 上基于 CNN 的遥感物体检测的可配置加速器","authors":"Yingzhao Shao, Jincheng Shang, Yunsong Li, Yueli Ding, Mingming Zhang, Ke Ren, Yang Liu","doi":"10.1049/2024/4415342","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Convolutional neural networks (CNNs) have been widely used in satellite remote sensing. However, satellites in orbit with limited resources and power consumption cannot meet the storage and computing power requirements of current million-scale artificial intelligence models. This paper proposes a new generation of high flexibility and intelligent CNNs hardware accelerator for satellite remote sensing in order to make its computing carrier more lightweight and efficient. A data quantization scheme for INT16 or INT8 is designed based on the idea of dynamic fixed point numbers and is applied to different scenarios. The operation mode of the systolic array is divided into channel blocks, and the calculation method is optimized to increase the utilization of on-chip computing resources and enhance the calculation efficiency. An RTL-level CNNs field programable gate arrays accelerator with microinstruction sequence scheduling data flow is then designed. The hardware framework is built upon the Xilinx VC709. The results show that, under INT16 or INT8 precision, the system achieves remarkable throughput in most convolutional layers of the network, with an average performance of 153.14 giga operations per second (GOPS) or 301.52 GOPS, which is close to the system’s peak performance, taking full advantage of the platform’s parallel computing capabilities.</p>\n </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"2024 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4415342","citationCount":"0","resultStr":"{\"title\":\"A Configurable Accelerator for CNN-Based Remote Sensing Object Detection on FPGAs\",\"authors\":\"Yingzhao Shao, Jincheng Shang, Yunsong Li, Yueli Ding, Mingming Zhang, Ke Ren, Yang Liu\",\"doi\":\"10.1049/2024/4415342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>Convolutional neural networks (CNNs) have been widely used in satellite remote sensing. However, satellites in orbit with limited resources and power consumption cannot meet the storage and computing power requirements of current million-scale artificial intelligence models. This paper proposes a new generation of high flexibility and intelligent CNNs hardware accelerator for satellite remote sensing in order to make its computing carrier more lightweight and efficient. A data quantization scheme for INT16 or INT8 is designed based on the idea of dynamic fixed point numbers and is applied to different scenarios. The operation mode of the systolic array is divided into channel blocks, and the calculation method is optimized to increase the utilization of on-chip computing resources and enhance the calculation efficiency. An RTL-level CNNs field programable gate arrays accelerator with microinstruction sequence scheduling data flow is then designed. The hardware framework is built upon the Xilinx VC709. The results show that, under INT16 or INT8 precision, the system achieves remarkable throughput in most convolutional layers of the network, with an average performance of 153.14 giga operations per second (GOPS) or 301.52 GOPS, which is close to the system’s peak performance, taking full advantage of the platform’s parallel computing capabilities.</p>\\n </div>\",\"PeriodicalId\":50383,\"journal\":{\"name\":\"IET Computers and Digital Techniques\",\"volume\":\"2024 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4415342\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computers and Digital Techniques\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/2024/4415342\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computers and Digital Techniques","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/2024/4415342","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Configurable Accelerator for CNN-Based Remote Sensing Object Detection on FPGAs
Convolutional neural networks (CNNs) have been widely used in satellite remote sensing. However, satellites in orbit with limited resources and power consumption cannot meet the storage and computing power requirements of current million-scale artificial intelligence models. This paper proposes a new generation of high flexibility and intelligent CNNs hardware accelerator for satellite remote sensing in order to make its computing carrier more lightweight and efficient. A data quantization scheme for INT16 or INT8 is designed based on the idea of dynamic fixed point numbers and is applied to different scenarios. The operation mode of the systolic array is divided into channel blocks, and the calculation method is optimized to increase the utilization of on-chip computing resources and enhance the calculation efficiency. An RTL-level CNNs field programable gate arrays accelerator with microinstruction sequence scheduling data flow is then designed. The hardware framework is built upon the Xilinx VC709. The results show that, under INT16 or INT8 precision, the system achieves remarkable throughput in most convolutional layers of the network, with an average performance of 153.14 giga operations per second (GOPS) or 301.52 GOPS, which is close to the system’s peak performance, taking full advantage of the platform’s parallel computing capabilities.
期刊介绍:
IET Computers & Digital Techniques publishes technical papers describing recent research and development work in all aspects of digital system-on-chip design and test of electronic and embedded systems, including the development of design automation tools (methodologies, algorithms and architectures). Papers based on the problems associated with the scaling down of CMOS technology are particularly welcome. It is aimed at researchers, engineers and educators in the fields of computer and digital systems design and test.
The key subject areas of interest are:
Design Methods and Tools: CAD/EDA tools, hardware description languages, high-level and architectural synthesis, hardware/software co-design, platform-based design, 3D stacking and circuit design, system on-chip architectures and IP cores, embedded systems, logic synthesis, low-power design and power optimisation.
Simulation, Test and Validation: electrical and timing simulation, simulation based verification, hardware/software co-simulation and validation, mixed-domain technology modelling and simulation, post-silicon validation, power analysis and estimation, interconnect modelling and signal integrity analysis, hardware trust and security, design-for-testability, embedded core testing, system-on-chip testing, on-line testing, automatic test generation and delay testing, low-power testing, reliability, fault modelling and fault tolerance.
Processor and System Architectures: many-core systems, general-purpose and application specific processors, computational arithmetic for DSP applications, arithmetic and logic units, cache memories, memory management, co-processors and accelerators, systems and networks on chip, embedded cores, platforms, multiprocessors, distributed systems, communication protocols and low-power issues.
Configurable Computing: embedded cores, FPGAs, rapid prototyping, adaptive computing, evolvable and statically and dynamically reconfigurable and reprogrammable systems, reconfigurable hardware.
Design for variability, power and aging: design methods for variability, power and aging aware design, memories, FPGAs, IP components, 3D stacking, energy harvesting.
Case Studies: emerging applications, applications in industrial designs, and design frameworks.