SOD-YOLOv10: Small Object Detection in Remote Sensing Images Based on YOLOv10

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society Pub Date : 2025-01-27 DOI:10.1109/LGRS.2025.3534786

Hui Sun;Guangzhen Yao;Sandong Zhu;Long Zhang;Hui Xu;Jun Kong

{"title":"SOD-YOLOv10: Small Object Detection in Remote Sensing Images Based on YOLOv10","authors":"Hui Sun;Guangzhen Yao;Sandong Zhu;Long Zhang;Hui Xu;Jun Kong","doi":"10.1109/LGRS.2025.3534786","DOIUrl":null,"url":null,"abstract":"YOLOv10, known for its efficiency in object detection methods, quickly and accurately detects objects in images. However, when detecting small objects in remote sensing imagery, traditional algorithms often encounter challenges like background noise, missing information, and complex multiobject interactions, which can affect detection performance. To address these issues, we propose an enhanced algorithm for detecting small objects, named SOD-YOLOv10. We design the Multidimensional Information Interaction for the Transformer Backbone (TransBone) Network, which enhances global perception capabilities and effectively integrates both local and global information, thereby improving the detection of small object features. We also propose a feature fusion technology using an attention mechanism, called aggregated attention in a gated feature pyramid network (AA-GFPN). This technology uses an efficient feature aggregation network and re-parameterization techniques to optimize information interaction between feature maps of different scales. Additionally, by incorporating the aggregated attention (AA) mechanism, it accurately identifies essential features of small objects. Moreover, we propose the adaptive focal powerful IoU (AFP-IoU) loss function, which not only prevents excessive expansion of the anchor box area but also significantly accelerates model convergence. To evaluate our method, we conduct thorough tests on the RSOD, NWPU VHR-10, VisDrone2019, and AI-TOD datasets. The findings indicate that our SOD-YOLOv10 model attains 95.90%, 92.46%, 55.61%, and 59.47% for mAP@0.5 and 73.42%, 66.84%, 39.03%, and 42.67% for mAP@0.5:0.95.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10855585/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

YOLOv10, known for its efficiency in object detection methods, quickly and accurately detects objects in images. However, when detecting small objects in remote sensing imagery, traditional algorithms often encounter challenges like background noise, missing information, and complex multiobject interactions, which can affect detection performance. To address these issues, we propose an enhanced algorithm for detecting small objects, named SOD-YOLOv10. We design the Multidimensional Information Interaction for the Transformer Backbone (TransBone) Network, which enhances global perception capabilities and effectively integrates both local and global information, thereby improving the detection of small object features. We also propose a feature fusion technology using an attention mechanism, called aggregated attention in a gated feature pyramid network (AA-GFPN). This technology uses an efficient feature aggregation network and re-parameterization techniques to optimize information interaction between feature maps of different scales. Additionally, by incorporating the aggregated attention (AA) mechanism, it accurately identifies essential features of small objects. Moreover, we propose the adaptive focal powerful IoU (AFP-IoU) loss function, which not only prevents excessive expansion of the anchor box area but also significantly accelerates model convergence. To evaluate our method, we conduct thorough tests on the RSOD, NWPU VHR-10, VisDrone2019, and AI-TOD datasets. The findings indicate that our SOD-YOLOv10 model attains 95.90%, 92.46%, 55.61%, and 59.47% for mAP@0.5 and 73.42%, 66.84%, 39.03%, and 42.67% for mAP@0.5:0.95.

查看原文