CAMCFormer: Cross-Attention and Multicorrelation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-02-19 DOI:10.1109/TGRS.2025.3543583
Lefan Wang;Shaohui Mei;Yi Wang;Jiawei Lian;Zonghao Han;Yan Feng
{"title":"CAMCFormer: Cross-Attention and Multicorrelation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images","authors":"Lefan Wang;Shaohui Mei;Yi Wang;Jiawei Lian;Zonghao Han;Yan Feng","doi":"10.1109/TGRS.2025.3543583","DOIUrl":null,"url":null,"abstract":"Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel cross-attention and multicorrelation aided transformer (CAMCFormer) FSOD framework tailored for global feature representation and multicorrelation modeling in complex and large-scale RSIs. Specifically, a long-distance cross-attention module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multicorrelation aided heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., channel-correlation detection head (CCDH), spatial-correlation detection head (SCDH), and cross-attention detection head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for FSOD in remote sensing scenarios.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-16"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10892299/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Few-shot object detection (FSOD) enables the detection of novel-class objects in remote sensing images (RSIs) with limited labeled samples. Although convolutional neural networks (CNNs) are commonly used for this task, they suffer from two inherent constraints. First, their limited local receptive field fails to capture global context within a single image and the relational dependencies between query and support images. Second, an additional feature alignment mechanism is typically required to bridge the gap between query and support images. To address these challenges, this work introduces a novel cross-attention and multicorrelation aided transformer (CAMCFormer) FSOD framework tailored for global feature representation and multicorrelation modeling in complex and large-scale RSIs. Specifically, a long-distance cross-attention module (LDCAM) is devised to capture dependencies between distant elements across query and support images at each feature extraction layer. This module facilitates the exchange of contextual information between images, resulting in more comprehensive feature representations and eliminating the need for separate feature alignment and fusion modules. Multicorrelation aided heads (MAHs) are constructed to enhance detection performance further to model various relational aspects, i.e., channel-correlation detection head (CCDH), spatial-correlation detection head (SCDH), and cross-attention detection head (CADH). These aided heads contribute to more robust and accurate classification and localization. Comprehensive experiments have been conducted, demonstrating the superiority of the proposed framework compared to several state-of-the-art detectors, highlighting its potential as an effective solution for FSOD in remote sensing scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CAMCFormer:用于光学遥感图像中少镜头目标检测的交叉注意和多相关辅助变压器
少射目标检测(FSOD)能够在有限标记样本的遥感图像(rsi)中检测新类目标。虽然卷积神经网络(cnn)通常用于此任务,但它们受到两个固有约束。首先,它们有限的局部接受域无法捕捉单个图像中的全局上下文以及查询和支持图像之间的关系依赖关系。其次,通常需要额外的特性对齐机制来弥合查询图像和支持图像之间的差距。为了解决这些挑战,本工作引入了一种新的交叉关注和多相关辅助变压器(CAMCFormer) FSOD框架,该框架专为复杂和大规模rsi中的全局特征表示和多相关建模而定制。具体来说,设计了一个远距离交叉关注模块(LDCAM),用于在每个特征提取层捕获跨查询和支持图像的远距离元素之间的依赖关系。该模块有助于图像之间的上下文信息交换,从而产生更全面的特征表示,并消除了单独的特征对齐和融合模块的需要。为了进一步提高检测性能,构建了多相关辅助检测头(MAHs)来模拟各种相关方面,即信道相关检测头(CCDH)、空间相关检测头(SCDH)和交叉注意检测头(CADH)。这些辅助头有助于更稳健和准确的分类和定位。进行了全面的实验,证明了所提出的框架与几种最先进的探测器相比的优越性,突出了其作为遥感场景中FSOD的有效解决方案的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
Layout-Controlled Synthetic Data Generation for Remote Sensing Object Detection Validation of the Surface Water and Ocean Topography (SWOT) KaRIn Sea Level Data Graph-Modulated Attention and Centrality-Enhanced Network for Remote Sensing Object Counting Enhancing Elastic Full-Waveform Inversion with Envelope Cosine Similarity: A Multiscale Frequency Strategy for Walkaway VSP Data Radiative Constraint-Based Scale Inversion for High-Altitude Aircraft from Three-Band Thermal Infrared Imagery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1