{"title":"People Detection System Using YOLOv3 Algorithm","authors":"N. I. Hassan, N. Tahir, F. Zaman, H. Hashim","doi":"10.1109/ICCSCE50387.2020.9204925","DOIUrl":null,"url":null,"abstract":"In crowd security systems, precise real-time detection of people in images or videos can be very challenging especially in complex and dense crowds whereby some individuals could possibly be partly or entirely occluded for varying lengths of time. Thus, this paper presents a large Convolutional Neural Network (CNN) that is trained using a single step model, You Only Look Once version 3 (YOLOv3) on Google Colaboratory to process the images within a database and to accurately locate people within the images. YOLOv3 splits the image up into regions and predicts bounding boxes and predicts the probabilities for each region. These bounding boxes are weighted by the projected probabilities and finally, the model is able to make its detection based on the final weights. This model will be using a customised dataset from Google’s Open Images with 500 high resolution images. Once trained, the neural network able to successfully generate the test data and achieve a mean average precision (mAP) of 78.3% and a final average loss of 0.6 on top of confidently detecting the people within the images.","PeriodicalId":193240,"journal":{"name":"2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSCE50387.2020.9204925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
In crowd security systems, precise real-time detection of people in images or videos can be very challenging especially in complex and dense crowds whereby some individuals could possibly be partly or entirely occluded for varying lengths of time. Thus, this paper presents a large Convolutional Neural Network (CNN) that is trained using a single step model, You Only Look Once version 3 (YOLOv3) on Google Colaboratory to process the images within a database and to accurately locate people within the images. YOLOv3 splits the image up into regions and predicts bounding boxes and predicts the probabilities for each region. These bounding boxes are weighted by the projected probabilities and finally, the model is able to make its detection based on the final weights. This model will be using a customised dataset from Google’s Open Images with 500 high resolution images. Once trained, the neural network able to successfully generate the test data and achieve a mean average precision (mAP) of 78.3% and a final average loss of 0.6 on top of confidently detecting the people within the images.
在人群安全系统中,对图像或视频中的人进行精确的实时检测可能非常具有挑战性,特别是在复杂和密集的人群中,有些人可能部分或完全被遮挡不同长度的时间。因此,本文提出了一个大型卷积神经网络(CNN),该网络使用谷歌协作实验室的单步模型You Only Look Once version 3 (YOLOv3)进行训练,以处理数据库中的图像并准确定位图像中的人。YOLOv3将图像分割成区域,并预测边界框,并预测每个区域的概率。这些边界框被投影概率加权,最后,模型能够基于最终的权重进行检测。该模型将使用来自谷歌开放图像的定制数据集,其中包含500张高分辨率图像。经过训练后,神经网络能够成功生成测试数据,并在自信地检测图像中的人的基础上实现78.3%的平均精度(mAP)和0.6的最终平均损失。