Madison L. Harasyn, W. Chan, Emma L. Ausen, D. Barber
{"title":"使用深度学习在无人机视频中检测和跟踪白鲸、皮划艇和摩托艇","authors":"Madison L. Harasyn, W. Chan, Emma L. Ausen, D. Barber","doi":"10.1139/juvs-2021-0024","DOIUrl":null,"url":null,"abstract":"Aerial imagery surveys are commonly used in marine mammal research to determine population size, distribution and habitat use. Analysis of aerial photos involves hours of manually identifying individuals present in each image and converting raw counts into useable biological statistics. Our research proposes the use of deep learning algorithms to increase the efficiency of the marine mammal research workflow. To test the feasibility of this proposal, the existing YOLOv4 convolutional neural network model was trained to detect belugas, kayaks and motorized boats in oblique drone imagery, collected from a stationary tethered system. Automated computer-based object detection achieved the following precision and recall, respectively, for each class: beluga = 74%/72%; boat = 97%/99%; and kayak = 96%/96%. We then tested the performance of computer vision tracking of belugas and manned watercraft in drone videos using the DeepSORT tracking algorithm, which achieved a multiple-object tracking accuracy (MOTA) ranging from 37% – 88% and multiple object tracking precision (MOTP) between 63% – 86%. Results from this research indicate that deep learning technology can detect and track features more consistently than human annotators, allowing for larger datasets to be processed within a fraction of the time while avoiding discrepancies introduced by labeling fatigue or multiple human annotators.","PeriodicalId":45619,"journal":{"name":"Journal of Unmanned Vehicle Systems","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Detection and tracking of belugas, kayaks and motorized boats in drone video using deep learning\",\"authors\":\"Madison L. Harasyn, W. Chan, Emma L. Ausen, D. Barber\",\"doi\":\"10.1139/juvs-2021-0024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aerial imagery surveys are commonly used in marine mammal research to determine population size, distribution and habitat use. Analysis of aerial photos involves hours of manually identifying individuals present in each image and converting raw counts into useable biological statistics. Our research proposes the use of deep learning algorithms to increase the efficiency of the marine mammal research workflow. To test the feasibility of this proposal, the existing YOLOv4 convolutional neural network model was trained to detect belugas, kayaks and motorized boats in oblique drone imagery, collected from a stationary tethered system. Automated computer-based object detection achieved the following precision and recall, respectively, for each class: beluga = 74%/72%; boat = 97%/99%; and kayak = 96%/96%. We then tested the performance of computer vision tracking of belugas and manned watercraft in drone videos using the DeepSORT tracking algorithm, which achieved a multiple-object tracking accuracy (MOTA) ranging from 37% – 88% and multiple object tracking precision (MOTP) between 63% – 86%. Results from this research indicate that deep learning technology can detect and track features more consistently than human annotators, allowing for larger datasets to be processed within a fraction of the time while avoiding discrepancies introduced by labeling fatigue or multiple human annotators.\",\"PeriodicalId\":45619,\"journal\":{\"name\":\"Journal of Unmanned Vehicle Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Unmanned Vehicle Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1139/juvs-2021-0024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Unmanned Vehicle Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1139/juvs-2021-0024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"REMOTE SENSING","Score":null,"Total":0}
Detection and tracking of belugas, kayaks and motorized boats in drone video using deep learning
Aerial imagery surveys are commonly used in marine mammal research to determine population size, distribution and habitat use. Analysis of aerial photos involves hours of manually identifying individuals present in each image and converting raw counts into useable biological statistics. Our research proposes the use of deep learning algorithms to increase the efficiency of the marine mammal research workflow. To test the feasibility of this proposal, the existing YOLOv4 convolutional neural network model was trained to detect belugas, kayaks and motorized boats in oblique drone imagery, collected from a stationary tethered system. Automated computer-based object detection achieved the following precision and recall, respectively, for each class: beluga = 74%/72%; boat = 97%/99%; and kayak = 96%/96%. We then tested the performance of computer vision tracking of belugas and manned watercraft in drone videos using the DeepSORT tracking algorithm, which achieved a multiple-object tracking accuracy (MOTA) ranging from 37% – 88% and multiple object tracking precision (MOTP) between 63% – 86%. Results from this research indicate that deep learning technology can detect and track features more consistently than human annotators, allowing for larger datasets to be processed within a fraction of the time while avoiding discrepancies introduced by labeling fatigue or multiple human annotators.