Automating Bird Detection Based on Webcam Captured Images using Deep Learning

EPiC series in computing Pub Date : 2022-01-01 DOI:10.29007/9fr5

Alex Mirugwe, Juwa Nyirenda, Emmanuel Dufourq

{"title":"Automating Bird Detection Based on Webcam Captured Images using Deep Learning","authors":"Alex Mirugwe, Juwa Nyirenda, Emmanuel Dufourq","doi":"10.29007/9fr5","DOIUrl":null,"url":null,"abstract":"One of the most challenging problems faced by ecologists and other biological re- searchers today is to analyze the massive amounts of data being collected by advanced monitoring systems like camera traps, wireless sensor networks, high-frequency radio track- ers, global positioning systems, and satellite tracking systems being used today. It has become expensive, laborious, and time-consuming to analyze this huge data using man- ual and traditional statistical techniques. Recent developments in the deep learning field are showing promising results towards automating the analysis of these extremely large datasets. The primary objective of this study was to test the capabilities of the state-of- the-art deep learning architectures to detect birds in the webcam captured images. A total of 10592 images were collected for this study from the Cornell Lab of Ornithology live stream feeds situated in six unique locations in United States, Ecuador, New Zealand, and Panama. To achieve the main objective of the study, we studied and evaluated two con- volutional neural network object detection meta-architectures, single-shot detector (SSD) and Faster R-CNN in combination with MobileNet-V2, ResNet50, ResNet101, ResNet152, and Inception ResNet-V2 feature extractors. Through transfer learning, all the models were initialized using weights pre-trained on the MS COCO (Microsoft Common Objects in Context) dataset provided by TensorFlow 2 object detection API. The Faster R-CNN model coupled with ResNet152 outperformed all other models with a mean average preci- sion of 92.3%. However, the SSD model with the MobileNet-V2 feature extraction network achieved the lowest inference time (110ms) and the smallest memory capacity (30.5MB) compared to its counterparts. The outstanding results achieved in this study confirm that deep learning-based algorithms are capable of detecting birds of different sizes in differ- ent environments and the best model could potentially help ecologists in monitoring and identifying birds from other species.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPiC series in computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29007/9fr5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

One of the most challenging problems faced by ecologists and other biological re- searchers today is to analyze the massive amounts of data being collected by advanced monitoring systems like camera traps, wireless sensor networks, high-frequency radio track- ers, global positioning systems, and satellite tracking systems being used today. It has become expensive, laborious, and time-consuming to analyze this huge data using man- ual and traditional statistical techniques. Recent developments in the deep learning field are showing promising results towards automating the analysis of these extremely large datasets. The primary objective of this study was to test the capabilities of the state-of- the-art deep learning architectures to detect birds in the webcam captured images. A total of 10592 images were collected for this study from the Cornell Lab of Ornithology live stream feeds situated in six unique locations in United States, Ecuador, New Zealand, and Panama. To achieve the main objective of the study, we studied and evaluated two con- volutional neural network object detection meta-architectures, single-shot detector (SSD) and Faster R-CNN in combination with MobileNet-V2, ResNet50, ResNet101, ResNet152, and Inception ResNet-V2 feature extractors. Through transfer learning, all the models were initialized using weights pre-trained on the MS COCO (Microsoft Common Objects in Context) dataset provided by TensorFlow 2 object detection API. The Faster R-CNN model coupled with ResNet152 outperformed all other models with a mean average preci- sion of 92.3%. However, the SSD model with the MobileNet-V2 feature extraction network achieved the lowest inference time (110ms) and the smallest memory capacity (30.5MB) compared to its counterparts. The outstanding results achieved in this study confirm that deep learning-based algorithms are capable of detecting birds of different sizes in differ- ent environments and the best model could potentially help ecologists in monitoring and identifying birds from other species.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的网络摄像头捕获图像的鸟类自动检测

当今生态学家和其他生物研究人员面临的最具挑战性的问题之一是分析由先进的监测系统收集的大量数据，这些系统包括摄像机陷阱、无线传感器网络、高频无线电跟踪器、全球定位系统和卫星跟踪系统。使用人工和传统的统计技术来分析这些庞大的数据已经变得昂贵、费力和耗时。深度学习领域的最新发展显示了对这些超大数据集的自动化分析的有希望的结果。本研究的主要目的是测试最先进的深度学习架构在网络摄像头捕获的图像中检测鸟类的能力。本研究共收集了10592张图像，这些图像来自康奈尔鸟类学实验室位于美国、厄瓜多尔、新西兰和巴拿马六个独特地点的直播饲料。为了实现研究的主要目标，我们研究并评估了两种卷积神经网络目标检测元架构，即单次检测(SSD)和Faster R-CNN，结合MobileNet-V2、ResNet50、ResNet101、ResNet152和Inception ResNet-V2特征提取器。通过迁移学习，使用TensorFlow 2对象检测API提供的MS COCO (Microsoft Common Objects in Context)数据集预训练的权重对所有模型进行初始化。与ResNet152结合的Faster R-CNN模型以92.3%的平均精度优于所有其他模型。然而，与同类模型相比，具有MobileNet-V2特征提取网络的SSD模型实现了最低的推理时间(110ms)和最小的内存容量(30.5MB)。本研究取得的突出结果证实，基于深度学习的算法能够在不同环境中检测不同大小的鸟类，最佳模型可能有助于生态学家监测和识别其他物种的鸟类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

EPiC series in computing

CiteScore

1.60

自引率

0.00%

发文量