{"title":"遥感图像多模态半监督语义分割的差分互补学习和标签重分配","authors":"Wenqi Han;Wen Jiang;Jie Geng;Wang Miao","doi":"10.1109/TIP.2025.3526064","DOIUrl":null,"url":null,"abstract":"The feature fusion of optical and Synthetic Aperture Radar (SAR) images is widely used for semantic segmentation of multimodal remote sensing images. It leverages information from two different sensors to enhance the analytical capabilities of land cover. However, the imaging characteristics of optical and SAR data are vastly different, and noise interference makes the fusion of multimodal data information challenging. Furthermore, in practical remote sensing applications, there are typically only a limited number of labeled samples available, with most pixels needing to be labeled. Semi-supervised learning has the potential to improve model performance in scenarios with limited labeled data. However, in remote sensing applications, the quality of pseudo-labels is frequently compromised, particularly in challenging regions such as blurred edges and areas with class confusion. This degradation in label quality can have a detrimental effect on the model’s overall performance. In this paper, we introduce the Difference-complementary Learning and Label Reassignment (DLLR) network for multimodal semi-supervised semantic segmentation of remote sensing images. Our proposed DLLR framework leverages asymmetric masking to create information discrepancies between the optical and SAR modalities, and employs a difference-guided complementary learning strategy to enable mutual learning. Subsequently, we introduce a multi-level label reassignment strategy, treating the label assignment problem as an optimal transport optimization task to allocate pixels to classes with higher precision for unlabeled pixels, thereby enhancing the quality of pseudo-label annotations. Finally, we introduce a multimodal consistency cross pseudo-supervision strategy to improve pseudo-label utilization. We evaluate our method on two multimodal remote sensing datasets, namely, the WHU-OPT-SAR and EErDS-OPT-SAR datasets. Experimental results demonstrate that our proposed DLLR model outperforms other relevant deep networks in terms of accuracy in multimodal semantic segmentation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"566-580"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Difference-Complementary Learning and Label Reassignment for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images\",\"authors\":\"Wenqi Han;Wen Jiang;Jie Geng;Wang Miao\",\"doi\":\"10.1109/TIP.2025.3526064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The feature fusion of optical and Synthetic Aperture Radar (SAR) images is widely used for semantic segmentation of multimodal remote sensing images. It leverages information from two different sensors to enhance the analytical capabilities of land cover. However, the imaging characteristics of optical and SAR data are vastly different, and noise interference makes the fusion of multimodal data information challenging. Furthermore, in practical remote sensing applications, there are typically only a limited number of labeled samples available, with most pixels needing to be labeled. Semi-supervised learning has the potential to improve model performance in scenarios with limited labeled data. However, in remote sensing applications, the quality of pseudo-labels is frequently compromised, particularly in challenging regions such as blurred edges and areas with class confusion. This degradation in label quality can have a detrimental effect on the model’s overall performance. In this paper, we introduce the Difference-complementary Learning and Label Reassignment (DLLR) network for multimodal semi-supervised semantic segmentation of remote sensing images. Our proposed DLLR framework leverages asymmetric masking to create information discrepancies between the optical and SAR modalities, and employs a difference-guided complementary learning strategy to enable mutual learning. Subsequently, we introduce a multi-level label reassignment strategy, treating the label assignment problem as an optimal transport optimization task to allocate pixels to classes with higher precision for unlabeled pixels, thereby enhancing the quality of pseudo-label annotations. Finally, we introduce a multimodal consistency cross pseudo-supervision strategy to improve pseudo-label utilization. We evaluate our method on two multimodal remote sensing datasets, namely, the WHU-OPT-SAR and EErDS-OPT-SAR datasets. Experimental results demonstrate that our proposed DLLR model outperforms other relevant deep networks in terms of accuracy in multimodal semantic segmentation.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"566-580\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838294/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10838294/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Difference-Complementary Learning and Label Reassignment for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images
The feature fusion of optical and Synthetic Aperture Radar (SAR) images is widely used for semantic segmentation of multimodal remote sensing images. It leverages information from two different sensors to enhance the analytical capabilities of land cover. However, the imaging characteristics of optical and SAR data are vastly different, and noise interference makes the fusion of multimodal data information challenging. Furthermore, in practical remote sensing applications, there are typically only a limited number of labeled samples available, with most pixels needing to be labeled. Semi-supervised learning has the potential to improve model performance in scenarios with limited labeled data. However, in remote sensing applications, the quality of pseudo-labels is frequently compromised, particularly in challenging regions such as blurred edges and areas with class confusion. This degradation in label quality can have a detrimental effect on the model’s overall performance. In this paper, we introduce the Difference-complementary Learning and Label Reassignment (DLLR) network for multimodal semi-supervised semantic segmentation of remote sensing images. Our proposed DLLR framework leverages asymmetric masking to create information discrepancies between the optical and SAR modalities, and employs a difference-guided complementary learning strategy to enable mutual learning. Subsequently, we introduce a multi-level label reassignment strategy, treating the label assignment problem as an optimal transport optimization task to allocate pixels to classes with higher precision for unlabeled pixels, thereby enhancing the quality of pseudo-label annotations. Finally, we introduce a multimodal consistency cross pseudo-supervision strategy to improve pseudo-label utilization. We evaluate our method on two multimodal remote sensing datasets, namely, the WHU-OPT-SAR and EErDS-OPT-SAR datasets. Experimental results demonstrate that our proposed DLLR model outperforms other relevant deep networks in terms of accuracy in multimodal semantic segmentation.