基于自我注意引导和全局特征融合的无人机图像目标检测

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2024-11-01 Epub Date: 2024-09-10 DOI:10.1016/j.imavis.2024.105262

Jing Bai , Haiyang Hu , Xiaojing Liu , Shanna Zhuang , Zhengyou Wang

{"title":"基于自我注意引导和全局特征融合的无人机图像目标检测","authors":"Jing Bai , Haiyang Hu , Xiaojing Liu , Shanna Zhuang , Zhengyou Wang","doi":"10.1016/j.imavis.2024.105262","DOIUrl":null,"url":null,"abstract":"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105262"},"PeriodicalIF":4.2000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UAV image object detection based on self-attention guidance and global feature fusion\",\"authors\":\"Jing Bai , Haiyang Hu , Xiaojing Liu , Shanna Zhuang , Zhengyou Wang\",\"doi\":\"10.1016/j.imavis.2024.105262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"151 \",\"pages\":\"Article 105262\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624003676\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003676","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

无人机图像目标检测在智能交通、城市管理和农业监测等领域受到广泛关注。然而，在实际应用中，它面临着多尺度特征提取不足、处理复杂场景和小型目标时不准确等主要挑战。针对这一挑战，我们提出了一种基于自注意引导和全局特征融合的新型无人机图像目标检测网络，命名为 SGGF-Net。首先，为了优化全局视角下的特征提取并提高目标定位精度，我们引入了全局特征提取模块（GFEM），利用自注意机制捕捉并整合图像中的长距离依赖关系。其次，我们开发了基于正态分布的先验分配器（NDPA），通过测量地面实况与先验之间的相似度来提高目标位置匹配的精度，从而解决小目标定位不准的问题。此外，我们还通过多级特征的深度融合策略设计了注意力引导的 ROI 池模块（ARPM），以优化多尺度特征的整合，提高特征表示的质量。最后，实验结果证明了所提出的 SGGF-Net 方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

UAV image object detection based on self-attention guidance and global feature fusion

Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.