Yishuo Meng;Junfeng Wu;Siwei Xiang;Jianfei Wang;Jia Hou;Zhijie Lin;Chen Yang
{"title":"A High-Throughput and Flexible CNN Accelerator Based on Mixed-Radix FFT Method","authors":"Yishuo Meng;Junfeng Wu;Siwei Xiang;Jianfei Wang;Jia Hou;Zhijie Lin;Chen Yang","doi":"10.1109/TCSI.2024.3466563","DOIUrl":null,"url":null,"abstract":"CNN acceleration algorithms, including Winograd, Fast Fourier Transform (FFT) and Number Theoretic transform (NTT), have demonstrated their potential in efficiently operating current Convolutional Neural Networks (CNNs). However, deploying FFT algorithm for CNN acceleration would introduce significant invalid elements, unnecessary computations and unacceptable transformation overhead. To address these issues, this paper proposes a series of improved methods along with an FFT-based architecture for efficient and simplified CNN acceleration. First, a novel mixed-radix FFT algorithm is proposed for the reduction of invalid elements. Moreover, Hermitian symmetry is utilized to further reduce the scale of FFT transformation and the number of multiplications. Furthermore, an efficient FFT-based CNN accelerator with a resource-efficient transformation component and a multiplication-reduced PE array is designed. Our proposed accelerator is implemented based on Xilinx XCVU440 with a running frequency of 238MHz, achieving actual performance of 2109-2797 GOPS and DSP efficiency of 1.37-1.82 GOPS/DSP. Compared to previous works based on Winograd, FFT and NTT, our proposed accelerator can realize up to <inline-formula> <tex-math>$9.42\\times $ </tex-math></inline-formula> speedup on actual performance and <inline-formula> <tex-math>$1.11\\times -6.41\\times $ </tex-math></inline-formula> speedup on DSP efficiency.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 2","pages":"816-829"},"PeriodicalIF":5.2000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10705360/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
CNN acceleration algorithms, including Winograd, Fast Fourier Transform (FFT) and Number Theoretic transform (NTT), have demonstrated their potential in efficiently operating current Convolutional Neural Networks (CNNs). However, deploying FFT algorithm for CNN acceleration would introduce significant invalid elements, unnecessary computations and unacceptable transformation overhead. To address these issues, this paper proposes a series of improved methods along with an FFT-based architecture for efficient and simplified CNN acceleration. First, a novel mixed-radix FFT algorithm is proposed for the reduction of invalid elements. Moreover, Hermitian symmetry is utilized to further reduce the scale of FFT transformation and the number of multiplications. Furthermore, an efficient FFT-based CNN accelerator with a resource-efficient transformation component and a multiplication-reduced PE array is designed. Our proposed accelerator is implemented based on Xilinx XCVU440 with a running frequency of 238MHz, achieving actual performance of 2109-2797 GOPS and DSP efficiency of 1.37-1.82 GOPS/DSP. Compared to previous works based on Winograd, FFT and NTT, our proposed accelerator can realize up to $9.42\times $ speedup on actual performance and $1.11\times -6.41\times $ speedup on DSP efficiency.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.