DF2RQ: Dynamic Feature Fusion via Region-Wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data

IF 8.6 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-01-06 DOI:10.1109/TGRS.2025.3526247

Shiyang Feng;Zhaowei Li;Bo Zhang;Bin Wang

{"title":"DF2RQ: Dynamic Feature Fusion via Region-Wise Queries for Semantic Segmentation of Multimodal Remote Sensing Data","authors":"Shiyang Feng;Zhaowei Li;Bo Zhang;Bin Wang","doi":"10.1109/TGRS.2025.3526247","DOIUrl":null,"url":null,"abstract":"Although remote sensing (RS) data with multiple modalities can be used to significantly improve the accuracy of semantic segmentation in RS data, how to effectively extract multimodal information through multimodal feature fusion remains a challenging task. Specifically, existing methods for multimodal feature fusion still face two major challenges: 1) due to the diverse imaging mechanisms of multimodal RS data, the boundaries of the same foreground may vary across different modalities, leading to the inclusion of unwanted background semantics in the fused foreground features, and 2) RS data from different modalities exhibit varying discriminative abilities for different foregrounds, making it challenging to determine the proportion of semantic information for each modality in the fusion results. To address the above issues, we propose a dynamic feature fusion method based on region-wise queries, namely, DF2RQ, for SS of multimodal RS data. This method is primarily composed of two components: the spatial reconstruction (SR) module and the dynamic fusion (DF) module. Within the SR module, we propose an SR scheme that samples foreground features from different modalities, achieving independent reconstruction of different unimodal features, thereby alleviating the semantic mixing between foreground and background across modalities. In the DF module, a feature fusion scheme based on unimodal feature reference positions is proposed to obtain fusion weights for each modality, thereby enabling the DF of complementary features from multiple modalities. The performance of the proposed method has been extensively evaluated on various multimodal RS datasets for SS, and the experimental results consistently show that the proposed method achieves state-of-the-art (SOTA) accuracy on multiple commonly used metrics. In addition, our code is available at <uri>https://github.com/I3ab/DF2RQ</uri>.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10829634/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Although remote sensing (RS) data with multiple modalities can be used to significantly improve the accuracy of semantic segmentation in RS data, how to effectively extract multimodal information through multimodal feature fusion remains a challenging task. Specifically, existing methods for multimodal feature fusion still face two major challenges: 1) due to the diverse imaging mechanisms of multimodal RS data, the boundaries of the same foreground may vary across different modalities, leading to the inclusion of unwanted background semantics in the fused foreground features, and 2) RS data from different modalities exhibit varying discriminative abilities for different foregrounds, making it challenging to determine the proportion of semantic information for each modality in the fusion results. To address the above issues, we propose a dynamic feature fusion method based on region-wise queries, namely, DF2RQ, for SS of multimodal RS data. This method is primarily composed of two components: the spatial reconstruction (SR) module and the dynamic fusion (DF) module. Within the SR module, we propose an SR scheme that samples foreground features from different modalities, achieving independent reconstruction of different unimodal features, thereby alleviating the semantic mixing between foreground and background across modalities. In the DF module, a feature fusion scheme based on unimodal feature reference positions is proposed to obtain fusion weights for each modality, thereby enabling the DF of complementary features from multiple modalities. The performance of the proposed method has been extensively evaluated on various multimodal RS datasets for SS, and the experimental results consistently show that the proposed method achieves state-of-the-art (SOTA) accuracy on multiple commonly used metrics. In addition, our code is available at https://github.com/I3ab/DF2RQ.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于区域查询的多模态遥感数据语义分割的动态特征融合

虽然多模态遥感数据可以显著提高遥感数据语义分割的准确性，但如何通过多模态特征融合有效提取多模态信息仍然是一个具有挑战性的课题。具体而言，现有的多模态特征融合方法仍然面临两大挑战：1)由于多模态遥感数据的成像机制不同，同一前景在不同模态下的边界可能不同，导致融合的前景特征中包含不需要的背景语义；2)不同模态遥感数据对不同前景的判别能力不同，难以确定每种模态的语义信息在融合结果中的比例。针对以上问题，本文提出了一种基于区域查询的多模态遥感数据动态特征融合方法——DF2RQ。该方法主要由空间重构（SR）模块和动态融合（DF）模块两部分组成。在SR模块中，我们提出了一种从不同模态采样前景特征的SR方案，实现了不同单模态特征的独立重建，从而减轻了前景和背景跨模态之间的语义混合。在DF模块中，提出了一种基于单模态特征参考位置的特征融合方案，获取各模态的融合权重，从而实现多模态互补特征的DF。该方法的性能已经在各种多模态RS数据集上进行了广泛的评估，实验结果一致表明，该方法在多个常用指标上达到了最先进（SOTA）的精度。此外，我们的代码可以在https://github.com/I3ab/DF2RQ上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.