Shichao Cui;Wei Chen;Wenwu Xiong;Xin Xu;Xinyu Shi;Canhai Li
{"title":"SiMultiF: A Remote Sensing Multimodal Semantic Segmentation Network With Adaptive Allocation of Modal Weights for Siamese Structures in Multiscene","authors":"Shichao Cui;Wei Chen;Wenwu Xiong;Xin Xu;Xinyu Shi;Canhai Li","doi":"10.1109/TGRS.2025.3553713","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of remote sensing images is crucial for resource exploration, precision agriculture, and environmental monitoring. However, conducting semantic segmentation on single-modality data for remote sensing images that contain various scenes, especially unique scenes, is highly challenging. To address this challenge, we propose SiMultiF, a Siamese architecture-based multimodal feature adaptive fusion semantic segmentation network. SiMultiF employs a dual-branch Siamese structure feature extractor. The adaptive feature weight adjustment module (AFWAM) and the multimodal fusion module (MFM) facilitate in-depth understanding and extraction of multimodal data. Specifically, the Siamese structure can extract features from multimodal data concurrently without adding to the number of parameters. The AFWAM module can adaptively identify the importance of different modal data and dynamically adjust the modal weight to enhance the network’s comprehension of complex scene data. Additionally, the cross-attention (CA)-based MFM module bridges modality gaps and achieves comprehensive multimodal feature fusion. Numerous experiments have demonstrated that the proposed SiMultiF outperforms other state-of-the-art semantic segmentation models (both multimodal and single modal) on the high-resolution ISPRS Potsdam dataset, ISPRS Vaihingen dataset, and special scene dataset (vegetation polarization dataset with extreme natural lighting contrast). Moreover, the robustness and generalizability of the network in multiscene and multimodal datasets are verified.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-17"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937096/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic segmentation of remote sensing images is crucial for resource exploration, precision agriculture, and environmental monitoring. However, conducting semantic segmentation on single-modality data for remote sensing images that contain various scenes, especially unique scenes, is highly challenging. To address this challenge, we propose SiMultiF, a Siamese architecture-based multimodal feature adaptive fusion semantic segmentation network. SiMultiF employs a dual-branch Siamese structure feature extractor. The adaptive feature weight adjustment module (AFWAM) and the multimodal fusion module (MFM) facilitate in-depth understanding and extraction of multimodal data. Specifically, the Siamese structure can extract features from multimodal data concurrently without adding to the number of parameters. The AFWAM module can adaptively identify the importance of different modal data and dynamically adjust the modal weight to enhance the network’s comprehension of complex scene data. Additionally, the cross-attention (CA)-based MFM module bridges modality gaps and achieves comprehensive multimodal feature fusion. Numerous experiments have demonstrated that the proposed SiMultiF outperforms other state-of-the-art semantic segmentation models (both multimodal and single modal) on the high-resolution ISPRS Potsdam dataset, ISPRS Vaihingen dataset, and special scene dataset (vegetation polarization dataset with extreme natural lighting contrast). Moreover, the robustness and generalizability of the network in multiscene and multimodal datasets are verified.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.