{"title":"SDMNet: Spatially dilated multi-scale network for object detection for drone aerial imagery","authors":"Neeraj Battish , Dapinder Kaur , Moksh Chugh , Shashi Poddar","doi":"10.1016/j.imavis.2024.105232","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-scale object detection is a preeminent challenge in computer vision and image processing. Several deep learning models that are designed to detect various objects miss out on the detection capabilities for small objects, reducing their detection accuracies. Intending to focus on different scales, from extremely small to large-sized objects, this work proposes a Spatially Dilated Multi-Scale Network (SDMNet) architecture for UAV-based ground object detection. It proposes a Multi-scale Enhanced Effective Channel Attention mechanism to preserve the object details in the images. Additionally, the proposed model incorporates dilated convolution, sub-pixel convolution, and additional prediction heads to enhance object detection performance specifically for aerial imaging. It has been evaluated on two popular aerial image datasets, VisDrone 2019 and UAVDT, containing publicly available annotated images of ground objects captured from UAV. Different performance metrics, such as precision, recall, mAP, and detection rate, benchmark the proposed architecture with the existing object detection approaches. The experimental results demonstrate the effectiveness of the proposed model for multi-scale object detection with an average precision score of 54.2% and 98.4% for VisDrone and UAVDT datasets, respectively.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105232"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003378","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-scale object detection is a preeminent challenge in computer vision and image processing. Several deep learning models that are designed to detect various objects miss out on the detection capabilities for small objects, reducing their detection accuracies. Intending to focus on different scales, from extremely small to large-sized objects, this work proposes a Spatially Dilated Multi-Scale Network (SDMNet) architecture for UAV-based ground object detection. It proposes a Multi-scale Enhanced Effective Channel Attention mechanism to preserve the object details in the images. Additionally, the proposed model incorporates dilated convolution, sub-pixel convolution, and additional prediction heads to enhance object detection performance specifically for aerial imaging. It has been evaluated on two popular aerial image datasets, VisDrone 2019 and UAVDT, containing publicly available annotated images of ground objects captured from UAV. Different performance metrics, such as precision, recall, mAP, and detection rate, benchmark the proposed architecture with the existing object detection approaches. The experimental results demonstrate the effectiveness of the proposed model for multi-scale object detection with an average precision score of 54.2% and 98.4% for VisDrone and UAVDT datasets, respectively.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.