基于自我注意引导和全局特征融合的无人机图像目标检测

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-09-10 DOI:10.1016/j.imavis.2024.105262
{"title":"基于自我注意引导和全局特征融合的无人机图像目标检测","authors":"","doi":"10.1016/j.imavis.2024.105262","DOIUrl":null,"url":null,"abstract":"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UAV image object detection based on self-attention guidance and global feature fusion\",\"authors\":\"\",\"doi\":\"10.1016/j.imavis.2024.105262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624003676\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003676","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

无人机图像目标检测在智能交通、城市管理和农业监测等领域受到广泛关注。然而,在实际应用中,它面临着多尺度特征提取不足、处理复杂场景和小型目标时不准确等主要挑战。针对这一挑战,我们提出了一种基于自注意引导和全局特征融合的新型无人机图像目标检测网络,命名为 SGGF-Net。首先,为了优化全局视角下的特征提取并提高目标定位精度,我们引入了全局特征提取模块(GFEM),利用自注意机制捕捉并整合图像中的长距离依赖关系。其次,我们开发了基于正态分布的先验分配器(NDPA),通过测量地面实况与先验之间的相似度来提高目标位置匹配的精度,从而解决小目标定位不准的问题。此外,我们还通过多级特征的深度融合策略设计了注意力引导的 ROI 池模块(ARPM),以优化多尺度特征的整合,提高特征表示的质量。最后,实验结果证明了所提出的 SGGF-Net 方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UAV image object detection based on self-attention guidance and global feature fusion

Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
期刊最新文献
A dictionary learning based unsupervised neural network for single image compressed sensing Unbiased scene graph generation via head-tail cooperative network with self-supervised learning UIR-ES: An unsupervised underwater image restoration framework with equivariance and stein unbiased risk estimator A new deepfake detection model for responding to perception attacks in embodied artificial intelligence Ground4Act: Leveraging visual-language model for collaborative pushing and grasping in clutter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1