{"title":"Automatic Optimising CNN with Depthwise Separable Convolution on FPGA: (Abstact Only)","authors":"Ruizhe Zhao, Xinyu Niu, W. Luk","doi":"10.1145/3174243.3174959","DOIUrl":null,"url":null,"abstract":"Convolution layers in Convolutional Neural Networks (CNNs) are effective in vision feature extraction but quite inefficient in computational resource usage. Depthwise separable convolution layer has been proposed in recent publications to enhance the efficiency without reducing the effectiveness by separately computing the spatial and cross-channel correlations from input images and has proven successful in state-of-the-art networks such as MobileNets [1] and Xception [2]. Based on the facts that depthwise separable convolution is highly structured and uses limited resources, we argue that it can well fit reconfigurable platforms like FPGA. To benefit FPGA platforms with this new layer, in this paper, we present a novel framework that can automatically generate and optimise hardware designs for depthwise separable CNNs. Besides, in our framework, existing conventional CNNs can be systematically converted to ones whose standard convolution layers are selectively replaced with functionally identical depthwise separable convolution layers, by carefully balancing the trade-off among speed, accuracy, and resource usage through resource usage modelling and network fine-tuning. Results show that hardware designs generated by our framework can reach at most 231.7 frames per second regarding MobileNets, and for VGG-16 [3], we gain 3.43 times speed-up and 3.54% accuracy decrease on the ImageNet [4] dataset comparing the original model and a layer replaced one.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Convolution layers in Convolutional Neural Networks (CNNs) are effective in vision feature extraction but quite inefficient in computational resource usage. Depthwise separable convolution layer has been proposed in recent publications to enhance the efficiency without reducing the effectiveness by separately computing the spatial and cross-channel correlations from input images and has proven successful in state-of-the-art networks such as MobileNets [1] and Xception [2]. Based on the facts that depthwise separable convolution is highly structured and uses limited resources, we argue that it can well fit reconfigurable platforms like FPGA. To benefit FPGA platforms with this new layer, in this paper, we present a novel framework that can automatically generate and optimise hardware designs for depthwise separable CNNs. Besides, in our framework, existing conventional CNNs can be systematically converted to ones whose standard convolution layers are selectively replaced with functionally identical depthwise separable convolution layers, by carefully balancing the trade-off among speed, accuracy, and resource usage through resource usage modelling and network fine-tuning. Results show that hardware designs generated by our framework can reach at most 231.7 frames per second regarding MobileNets, and for VGG-16 [3], we gain 3.43 times speed-up and 3.54% accuracy decrease on the ImageNet [4] dataset comparing the original model and a layer replaced one.