{"title":"Hardware Implementation of Yolov4-tiny for Object Detection","authors":"Omar Eid, M. M. A. E. Ghany","doi":"10.1109/ICM52667.2021.9664943","DOIUrl":null,"url":null,"abstract":"The high computational power of GPUs allowed for larger networks to be used in object detection applications. However, due to the huge power consumption and inefficiency when it comes to memory access and the number of bits used to represent the data, it is difficult to use them in embedded applications. Therefore, extensive research has been conducted to use FPGAs as a highly efficient substitute for GPUs to implement deep learning algorithms. As the scale and complexity of the algorithms keep increasing each year to improve their performance, it becomes even harder to implement such algorithms on an FPGA without reusing hardware resources. In this work, we implement Yolov4-tiny on a single FPGA by applying several resource sharing and optimization techniques. Our implementation shows a decrease in power consumption that ranges from 66% to 93.5% less power when compared to software. Moreover, less hardware resources and faster inference time is achieved. When comparing with the hardware implementation of networks with similar size, our design is 6.67 times faster and uses 62.5% less energy per image.","PeriodicalId":212613,"journal":{"name":"2021 International Conference on Microelectronics (ICM)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Microelectronics (ICM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICM52667.2021.9664943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The high computational power of GPUs allowed for larger networks to be used in object detection applications. However, due to the huge power consumption and inefficiency when it comes to memory access and the number of bits used to represent the data, it is difficult to use them in embedded applications. Therefore, extensive research has been conducted to use FPGAs as a highly efficient substitute for GPUs to implement deep learning algorithms. As the scale and complexity of the algorithms keep increasing each year to improve their performance, it becomes even harder to implement such algorithms on an FPGA without reusing hardware resources. In this work, we implement Yolov4-tiny on a single FPGA by applying several resource sharing and optimization techniques. Our implementation shows a decrease in power consumption that ranges from 66% to 93.5% less power when compared to software. Moreover, less hardware resources and faster inference time is achieved. When comparing with the hardware implementation of networks with similar size, our design is 6.67 times faster and uses 62.5% less energy per image.