Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Machine Learning and Cybernetics Pub Date : 2024-07-26 DOI:10.1007/s13042-024-02278-1
Zhongxu Li, Qihan He, Hong Zhao, Wenyuan Yang
{"title":"Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection","authors":"Zhongxu Li, Qihan He, Hong Zhao, Wenyuan Yang","doi":"10.1007/s13042-024-02278-1","DOIUrl":null,"url":null,"abstract":"<p>Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone’s flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"16 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02278-1","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone’s flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Doublem-net:用于无人机探测的多尺度空间金字塔集合--快速和多路径自适应特征金字塔网络
无人驾驶飞行器(UAV)因其灵活性、低成本和自主飞行能力,被广泛应用于军事、救援行动和交通探测等领域。然而,由于无人机的飞行高度和拍摄角度等原因,航拍图像中的物体比一般图像中的物体更小、更密集、更复杂,导致目标检测效果不理想。本文提出了一种无人机检测模型--DoubleM-Net,它包含多尺度空间金字塔池化-快速(MS-SPPF)和多路径自适应特征金字塔网络(MPA-FPN)。DoubleM-Net 利用 MS-SPPF 模块提取多种感受野大小的特征图。然后,MPA-FPN 模块首先融合每两个相邻尺度的特征,然后逐级进行交互式特征融合。首先,使用骨干网络作为特征提取器,从输入图像中提取不同尺度范围的多个特征图。其次,MS-SPPF 使用不同的池化核在不同尺度上重复多个池化操作,以实现丰富的多感知场特征。最后,MPA-FPN 模块首先在每个相邻的双尺度层之间整合语义信息。然后将顶层特征逐级传回底层,并增强底层特征,实现不同尺度特征的交互和整合。实验结果表明,DoubleM-Net 在 VisDrone 数据集上的 mAP50-95 比率为 27.5%,而在 RGB 和红外模式下,DoubleM-Net 在 DroneVehicle 数据集上的 mAP50-95 比率分别为 55.0% 和 60.4%。我们的模型在空对地图像检测任务中表现出了卓越的性能,在检测小型物体方面更是成绩斐然。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Machine Learning and Cybernetics
International Journal of Machine Learning and Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
7.90
自引率
10.70%
发文量
225
期刊介绍: Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data. The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC. Key research areas to be covered by the journal include: Machine Learning for modeling interactions between systems Pattern Recognition technology to support discovery of system-environment interaction Control of system-environment interactions Biochemical interaction in biological and biologically-inspired systems Learning for improvement of communication schemes between systems
期刊最新文献
LSSMSD: defending against black-box DNN model stealing based on localized stochastic sensitivity CHNSCDA: circRNA-disease association prediction based on strongly correlated heterogeneous neighbor sampling Contextual feature fusion and refinement network for camouflaged object detection Scnet: shape-aware convolution with KFNN for point clouds completion Self-refined variational transformer for image-conditioned layout generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1