{"title":"Mobile-X: Dedicated FPGA Implementation of the MobileNet Accelerator Optimizing Depthwise Separable Convolution","authors":"Hyeonseok Hong;Dahun Choi;Namjoon Kim;Hyun Kim","doi":"10.1109/TCSII.2024.3440884","DOIUrl":null,"url":null,"abstract":"MobileNet proposed depthwise separable convolution (DSC) as a replacement for standard convolution (SC), achieving significant reductions in parameters and computational complexity compared with traditional convolutional neural network (CNN) models. Recently, there has been a growing trend of deploying MobileNet on various edge devices by implementing accelerators. However, the distinctive computational patterns of depthwise convolution (DWC) and pointwise convolution (PWC) in MobileNet pose challenges for FPGA and ASIC accelerator implementations. In this brief, we propose DSC-dedicated processing engine (PE) designs specialized for DWC and PWC operations and an SC reordering module for only the first convolution layer. In addition, we introduce the pipeline DSC processing called pipelining separable convolution (PSC) and tiled-convolution (TC) techniques that consider the computational load of PWC. Our proposed 8-bit quantization in the accelerator causes only a negligible accuracy drop (i.e., 0.68%) compared with full precision, yet it enables hardware-friendly operations with only a single fixed-point multiplication. On the ZCU-102 platform, the proposed accelerator achieves 190.9 FPS and 108.3 GOPS using minimal hardware resources. Consequently, we achieve 18.20 GOPS/W, showing a \n<inline-formula> <tex-math>$3.7\\times $ </tex-math></inline-formula>\n power efficiency compared to the A-100 GPU.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"71 11","pages":"4668-4672"},"PeriodicalIF":4.9000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10630707/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
MobileNet proposed depthwise separable convolution (DSC) as a replacement for standard convolution (SC), achieving significant reductions in parameters and computational complexity compared with traditional convolutional neural network (CNN) models. Recently, there has been a growing trend of deploying MobileNet on various edge devices by implementing accelerators. However, the distinctive computational patterns of depthwise convolution (DWC) and pointwise convolution (PWC) in MobileNet pose challenges for FPGA and ASIC accelerator implementations. In this brief, we propose DSC-dedicated processing engine (PE) designs specialized for DWC and PWC operations and an SC reordering module for only the first convolution layer. In addition, we introduce the pipeline DSC processing called pipelining separable convolution (PSC) and tiled-convolution (TC) techniques that consider the computational load of PWC. Our proposed 8-bit quantization in the accelerator causes only a negligible accuracy drop (i.e., 0.68%) compared with full precision, yet it enables hardware-friendly operations with only a single fixed-point multiplication. On the ZCU-102 platform, the proposed accelerator achieves 190.9 FPS and 108.3 GOPS using minimal hardware resources. Consequently, we achieve 18.20 GOPS/W, showing a
$3.7\times $
power efficiency compared to the A-100 GPU.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.