{"title":"8-bit Convolutional Neural Network Accelerator for Face Recognition","authors":"Wei Pang, Yufeng Li, Shengli Lu","doi":"10.1109/UEMCON51285.2020.9298114","DOIUrl":null,"url":null,"abstract":"With the development of convolutional neural network (CNN), the accuracy of face recognition has been greatly improved. But the huge amount of weights and calculations hinders its implementation in portable devices. Designing hardware accelerator is an effective solution to the problem. In this paper, a face recognition algorithm is designed based on deep separable convolution. The weights and activations are quantified to 8 bits, reducing the requirement of data access and bandwidth. In addition, a generic CNN accelerator based on systolic array is designed and validated on Xilinx Zynq-XC7Z035 FPGA. The face recognition algorithm achieved an accuracy of 94.4% in the LFW dataset. The performance and power efficiency of the accelerator are 52.9 GOPS and 9.71GOPS/W at 100MHz, respectively. And the accelerator can process 160×160 face image at 25FPS.","PeriodicalId":433609,"journal":{"name":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON51285.2020.9298114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of convolutional neural network (CNN), the accuracy of face recognition has been greatly improved. But the huge amount of weights and calculations hinders its implementation in portable devices. Designing hardware accelerator is an effective solution to the problem. In this paper, a face recognition algorithm is designed based on deep separable convolution. The weights and activations are quantified to 8 bits, reducing the requirement of data access and bandwidth. In addition, a generic CNN accelerator based on systolic array is designed and validated on Xilinx Zynq-XC7Z035 FPGA. The face recognition algorithm achieved an accuracy of 94.4% in the LFW dataset. The performance and power efficiency of the accelerator are 52.9 GOPS and 9.71GOPS/W at 100MHz, respectively. And the accelerator can process 160×160 face image at 25FPS.