轻量级YOLOv2:基于FPGA的并行支持向量回归的二值化CNN

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174266

Hiroki Nakahara, H. Yonekawa, Tomoya Fujii, Shimpei Sato

{"title":"轻量级YOLOv2:基于FPGA的并行支持向量回归的二值化CNN","authors":"Hiroki Nakahara, H. Yonekawa, Tomoya Fujii, Shimpei Sato","doi":"10.1145/3174243.3174266","DOIUrl":null,"url":null,"abstract":"A frame object detection problem consists of two problems: one is a regression problem to spatially separated bounding boxes, the second is the associated classification of the objects within realtime frame rate. It is widely used in the embedded systems, such as robotics, autonomous driving, security, and drones - all of which require high-performance and low-power consumption. This paper implements the YOLO (You only look once) object detector on an FPGA, which is faster and has a higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part both the performance and the area. However, the object detector based on the CNN consists of a bounding box prediction (regression) and a class estimation (classification). Thus, the conventional all binarized CNN fails to recognize in most cases. In the paper, we propose a lightweight YOLOv2, which consists of the binarized CNN for a feature extraction and the parallel support vector regression (SVR) for both a classification and a localization. To our knowledge, this is the first time binarized CNN»s have been successfully used in object detection. We implement a pipelined based architecture for the lightweight YOLOv2 on the Xilinx Inc. zcu102 board, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC. The implemented object detector archived 40.81 frames per second (FPS). Compared with the ARM Cortex-A57, it was 177.4 times faster, it dissipated 1.1 times more power, and its performance per power efficiency was 158.9 times better. Also, compared with the nVidia Pascall embedded GPU, it was 27.5 times faster, it dissipated 1.5 times lower power, and its performance per power efficiency was 42.9 times better. Thus, our method is suitable for the frame object detector for an embedded vision system.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"114","resultStr":"{\"title\":\"A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA\",\"authors\":\"Hiroki Nakahara, H. Yonekawa, Tomoya Fujii, Shimpei Sato\",\"doi\":\"10.1145/3174243.3174266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A frame object detection problem consists of two problems: one is a regression problem to spatially separated bounding boxes, the second is the associated classification of the objects within realtime frame rate. It is widely used in the embedded systems, such as robotics, autonomous driving, security, and drones - all of which require high-performance and low-power consumption. This paper implements the YOLO (You only look once) object detector on an FPGA, which is faster and has a higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part both the performance and the area. However, the object detector based on the CNN consists of a bounding box prediction (regression) and a class estimation (classification). Thus, the conventional all binarized CNN fails to recognize in most cases. In the paper, we propose a lightweight YOLOv2, which consists of the binarized CNN for a feature extraction and the parallel support vector regression (SVR) for both a classification and a localization. To our knowledge, this is the first time binarized CNN»s have been successfully used in object detection. We implement a pipelined based architecture for the lightweight YOLOv2 on the Xilinx Inc. zcu102 board, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC. The implemented object detector archived 40.81 frames per second (FPS). Compared with the ARM Cortex-A57, it was 177.4 times faster, it dissipated 1.1 times more power, and its performance per power efficiency was 158.9 times better. Also, compared with the nVidia Pascall embedded GPU, it was 27.5 times faster, it dissipated 1.5 times lower power, and its performance per power efficiency was 42.9 times better. Thus, our method is suitable for the frame object detector for an embedded vision system.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"114\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 114

摘要

帧目标检测问题包括两个问题:一是对空间分离边界框的回归问题，二是在实时帧率下对目标进行相关分类。它广泛应用于嵌入式系统，如机器人，自动驾驶，安全和无人机-所有这些都需要高性能和低功耗。本文在FPGA上实现了YOLO (You only look once)目标检测器，该检测器速度更快，精度更高。它基于卷积深度神经网络(CNN)，在性能和面积上都占主导地位。然而，基于CNN的目标检测器由边界框预测(回归)和类估计(分类)组成。因此，传统的全二值化CNN在大多数情况下无法识别。在本文中，我们提出了一个轻量级的YOLOv2，它由用于特征提取的二值化CNN和用于分类和定位的并行支持向量回归(SVR)组成。据我们所知，这是首次成功地将二值化的CNN图像用于目标检测。我们在Xilinx Inc. zcu102板上为轻量级YOLOv2实现了基于流水线的架构，该板具有Xilinx Inc.。Zynq Ultrascale+ MPSoC。实现的目标检测器存档40.81帧每秒(FPS)。与ARM Cortex-A57相比，它的速度提高了177.4倍，功耗提高了1.1倍，单位功率效率提高了158.9倍。此外，与nVidia Pascall嵌入式GPU相比，它的速度快27.5倍，功耗低1.5倍，单位功率效率的性能提高了42.9倍。因此，我们的方法适用于嵌入式视觉系统的帧目标检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA

A frame object detection problem consists of two problems: one is a regression problem to spatially separated bounding boxes, the second is the associated classification of the objects within realtime frame rate. It is widely used in the embedded systems, such as robotics, autonomous driving, security, and drones - all of which require high-performance and low-power consumption. This paper implements the YOLO (You only look once) object detector on an FPGA, which is faster and has a higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part both the performance and the area. However, the object detector based on the CNN consists of a bounding box prediction (regression) and a class estimation (classification). Thus, the conventional all binarized CNN fails to recognize in most cases. In the paper, we propose a lightweight YOLOv2, which consists of the binarized CNN for a feature extraction and the parallel support vector regression (SVR) for both a classification and a localization. To our knowledge, this is the first time binarized CNN»s have been successfully used in object detection. We implement a pipelined based architecture for the lightweight YOLOv2 on the Xilinx Inc. zcu102 board, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC. The implemented object detector archived 40.81 frames per second (FPS). Compared with the ARM Cortex-A57, it was 177.4 times faster, it dissipated 1.1 times more power, and its performance per power efficiency was 158.9 times better. Also, compared with the nVidia Pascall embedded GPU, it was 27.5 times faster, it dissipated 1.5 times lower power, and its performance per power efficiency was 42.9 times better. Thus, our method is suitable for the frame object detector for an embedded vision system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量

期刊最新文献

Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning