Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications

IF 1.6 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Low Power Electronics and Applications Pub Date : 2023-06-06 DOI:10.3390/jlpea13020040
Fatima Zahra Guerrouj, Sergio Rodríguez Flórez, Mohamed Abouzahir, Abdelhafid El Ouardi, Mustapha Ramzi
{"title":"Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications","authors":"Fatima Zahra Guerrouj, Sergio Rodríguez Flórez, Mohamed Abouzahir, Abdelhafid El Ouardi, Mustapha Ramzi","doi":"10.3390/jlpea13020040","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) have been incredibly effective for object detection tasks. YOLOv4 is a state-of-the-art object detection algorithm designed for embedded systems. It is based on YOLOv3 and has improved accuracy, speed, and robustness. However, deploying CNNs on embedded systems such as Field Programmable Gate Arrays (FPGAs) is difficult due to their limited resources. To address this issue, FPGA-based CNN architectures have been developed to improve the resource utilization of CNNs, resulting in improved accuracy and speed. This paper examines the use of General Matrix Multiplication Operations (GEMM) to accelerate the execution of YOLOv4 on embedded systems. It reviews the most recent GEMM implementations and evaluates their accuracy and robustness. It also discusses the challenges of deploying YOLOv4 on autonomous vehicle datasets. Finally, the paper presents a case study demonstrating the successful implementation of YOLOv4 on an Intel Arria 10 embedded system using GEMM.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":"5 1","pages":"0"},"PeriodicalIF":1.6000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea13020040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional Neural Networks (CNNs) have been incredibly effective for object detection tasks. YOLOv4 is a state-of-the-art object detection algorithm designed for embedded systems. It is based on YOLOv3 and has improved accuracy, speed, and robustness. However, deploying CNNs on embedded systems such as Field Programmable Gate Arrays (FPGAs) is difficult due to their limited resources. To address this issue, FPGA-based CNN architectures have been developed to improve the resource utilization of CNNs, resulting in improved accuracy and speed. This paper examines the use of General Matrix Multiplication Operations (GEMM) to accelerate the execution of YOLOv4 on embedded systems. It reviews the most recent GEMM implementations and evaluates their accuracy and robustness. It also discusses the challenges of deploying YOLOv4 on autonomous vehicle datasets. Finally, the paper presents a case study demonstrating the successful implementation of YOLOv4 on an Intel Arria 10 embedded system using GEMM.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动驾驶应用中基于视觉的目标检测的高效GEMM实现
卷积神经网络(cnn)在目标检测任务中非常有效。YOLOv4是为嵌入式系统设计的最先进的目标检测算法。它基于YOLOv3,提高了准确性、速度和鲁棒性。然而,由于资源有限,在诸如现场可编程门阵列(fpga)等嵌入式系统上部署cnn是困难的。为了解决这个问题,基于fpga的CNN架构已经被开发出来,以提高CNN的资源利用率,从而提高准确性和速度。本文研究了在嵌入式系统上使用通用矩阵乘法运算(GEMM)来加速YOLOv4的执行。它回顾了最新的GEMM实现,并评估了它们的准确性和健壮性。本文还讨论了在自动驾驶汽车数据集上部署YOLOv4所面临的挑战。最后,本文给出了一个使用GEMM在Intel Arria 10嵌入式系统上成功实现YOLOv4的案例研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Low Power Electronics and Applications
Journal of Low Power Electronics and Applications Engineering-Electrical and Electronic Engineering
CiteScore
3.60
自引率
14.30%
发文量
57
审稿时长
11 weeks
期刊最新文献
Understanding Timing Error Characteristics from Overclocked Systolic Multiply–Accumulate Arrays in FPGAs Design and Assessment of Hybrid MTJ/CMOS Circuits for In-Memory-Computation Speed, Power and Area Optimized Monotonic Asynchronous Array Multipliers An Ultra Low Power Integer-N PLL with a High-Gain Sampling Phase Detector for IOT Applications in 65 nm CMOS Design of a Low-Power Delay-Locked Loop-Based 8× Frequency Multiplier in 22 nm FDSOI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1