DF2RQ: Dynamic Feature Fusion via Region-Wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-01-06 DOI:10.1109/TGRS.2025.3526247
Shiyang Feng;Zhaowei Li;Bo Zhang;Bin Wang
{"title":"DF2RQ: Dynamic Feature Fusion via Region-Wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data","authors":"Shiyang Feng;Zhaowei Li;Bo Zhang;Bin Wang","doi":"10.1109/TGRS.2025.3526247","DOIUrl":null,"url":null,"abstract":"Although remote sensing (RS) data with multiple modalities can be used to significantly improve the accuracy of semantic segmentation in RS data, how to effectively extract multimodal information through multimodal feature fusion remains a challenging task. Specifically, existing methods for multimodal feature fusion still face two major challenges: 1) due to the diverse imaging mechanisms of multimodal RS data, the boundaries of the same foreground may vary across different modalities, leading to the inclusion of unwanted background semantics in the fused foreground features, and 2) RS data from different modalities exhibit varying discriminative abilities for different foregrounds, making it challenging to determine the proportion of semantic information for each modality in the fusion results. To address the above issues, we propose a dynamic feature fusion method based on region-wise queries, namely, DF2RQ, for SS of multimodal RS data. This method is primarily composed of two components: the spatial reconstruction (SR) module and the dynamic fusion (DF) module. Within the SR module, we propose an SR scheme that samples foreground features from different modalities, achieving independent reconstruction of different unimodal features, thereby alleviating the semantic mixing between foreground and background across modalities. In the DF module, a feature fusion scheme based on unimodal feature reference positions is proposed to obtain fusion weights for each modality, thereby enabling the DF of complementary features from multiple modalities. The performance of the proposed method has been extensively evaluated on various multimodal RS datasets for SS, and the experimental results consistently show that the proposed method achieves state-of-the-art (SOTA) accuracy on multiple commonly used metrics. In addition, our code is available at <uri>https://github.com/I3ab/DF2RQ</uri>.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10829634/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Although remote sensing (RS) data with multiple modalities can be used to significantly improve the accuracy of semantic segmentation in RS data, how to effectively extract multimodal information through multimodal feature fusion remains a challenging task. Specifically, existing methods for multimodal feature fusion still face two major challenges: 1) due to the diverse imaging mechanisms of multimodal RS data, the boundaries of the same foreground may vary across different modalities, leading to the inclusion of unwanted background semantics in the fused foreground features, and 2) RS data from different modalities exhibit varying discriminative abilities for different foregrounds, making it challenging to determine the proportion of semantic information for each modality in the fusion results. To address the above issues, we propose a dynamic feature fusion method based on region-wise queries, namely, DF2RQ, for SS of multimodal RS data. This method is primarily composed of two components: the spatial reconstruction (SR) module and the dynamic fusion (DF) module. Within the SR module, we propose an SR scheme that samples foreground features from different modalities, achieving independent reconstruction of different unimodal features, thereby alleviating the semantic mixing between foreground and background across modalities. In the DF module, a feature fusion scheme based on unimodal feature reference positions is proposed to obtain fusion weights for each modality, thereby enabling the DF of complementary features from multiple modalities. The performance of the proposed method has been extensively evaluated on various multimodal RS datasets for SS, and the experimental results consistently show that the proposed method achieves state-of-the-art (SOTA) accuracy on multiple commonly used metrics. In addition, our code is available at https://github.com/I3ab/DF2RQ.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于区域查询的多模态遥感数据语义分割的动态特征融合
虽然多模态遥感数据可以显著提高遥感数据语义分割的准确性,但如何通过多模态特征融合有效提取多模态信息仍然是一个具有挑战性的课题。具体而言,现有的多模态特征融合方法仍然面临两大挑战:1)由于多模态遥感数据的成像机制不同,同一前景在不同模态下的边界可能不同,导致融合的前景特征中包含不需要的背景语义;2)不同模态遥感数据对不同前景的判别能力不同,难以确定每种模态的语义信息在融合结果中的比例。针对以上问题,本文提出了一种基于区域查询的多模态遥感数据动态特征融合方法——DF2RQ。该方法主要由空间重构(SR)模块和动态融合(DF)模块两部分组成。在SR模块中,我们提出了一种从不同模态采样前景特征的SR方案,实现了不同单模态特征的独立重建,从而减轻了前景和背景跨模态之间的语义混合。在DF模块中,提出了一种基于单模态特征参考位置的特征融合方案,获取各模态的融合权重,从而实现多模态互补特征的DF。该方法的性能已经在各种多模态RS数据集上进行了广泛的评估,实验结果一致表明,该方法在多个常用指标上达到了最先进(SOTA)的精度。此外,我们的代码可以在https://github.com/I3ab/DF2RQ上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
A Hierarchical Vision-Language Model-Guided Feature Fusion Framework for Referring Remote Sensing Image Segmentation Efficient One-Step Orthogonal Consensus Framework for Multi-View Remote Sensing Clustering Temporally-Similar Structure-Aware Spatiotemporal Fusion of Satellite Images WCDMF-Net: Wavelet-based Cross-Domain Multistage Feature Fusion Network for Infrared Small Target Detection Satellite Video Continuous Space-Time Super-Resolution via Mask-Based Temporal-Aware Warping and Cross-Level Frequency Integration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1