Hao Chang;Xiongjun Fu;Kunyi Guo;Jian Dong;Jialin Guan;Chuyi Liu
{"title":"SOLSTM: Multisource Information Fusion Semantic Segmentation Network Based on SAR-OPT Matching Attention and Long Short-Term Memory Network","authors":"Hao Chang;Xiongjun Fu;Kunyi Guo;Jian Dong;Jialin Guan;Chuyi Liu","doi":"10.1109/LGRS.2025.3535524","DOIUrl":null,"url":null,"abstract":"With the significant advancements in deep learning technology and the substantial improvement in remote sensing image resolution, remote sensing semantic segmentation has garnered widespread attention. Synthetic aperture radar (SAR) and optical images are the primary sources of remote sensing data, offering complementary information. SAR images can capture surface information even under cloud cover and at night, whereas optical images provide higher resolution in clear weather conditions. Deep learning-based feature fusion methods can effectively integrate multisource information to obtain more comprehensive surface data. However, there are significant spatiotemporal differences in multisource information, making it challenging to select and extract the most discriminative features for segmentation tasks. To address this, we propose a lightweight and efficient fusion semantic segmentation network, SOLSTM, which mixes SAR and optical images as inputs and performs cyclic cross-fusion to establish a new network paradigm. To tackle multisource data heterogeneity, we introduce SAR-OPT matching attention, which aggregates multisource image features by adaptively adjusting fusion weights, thereby achieving comprehensive perception of feature channels and contextual information. Additionally, to mitigate the high computational complexity of processing multidimensional data, we introduce the mLSTM block, which employs linear operations to mine global contextual information in fused images, thus reducing computational complexity and enhancing image segmentation performance. Experiments on the WHU-OPT-SAR dataset show that SOLSTM has excellent performance, achieving up to 52.9 mIoU and outperforming single source image segmentation, verifying the effective fusion of OPT-SAR.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10856228/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the significant advancements in deep learning technology and the substantial improvement in remote sensing image resolution, remote sensing semantic segmentation has garnered widespread attention. Synthetic aperture radar (SAR) and optical images are the primary sources of remote sensing data, offering complementary information. SAR images can capture surface information even under cloud cover and at night, whereas optical images provide higher resolution in clear weather conditions. Deep learning-based feature fusion methods can effectively integrate multisource information to obtain more comprehensive surface data. However, there are significant spatiotemporal differences in multisource information, making it challenging to select and extract the most discriminative features for segmentation tasks. To address this, we propose a lightweight and efficient fusion semantic segmentation network, SOLSTM, which mixes SAR and optical images as inputs and performs cyclic cross-fusion to establish a new network paradigm. To tackle multisource data heterogeneity, we introduce SAR-OPT matching attention, which aggregates multisource image features by adaptively adjusting fusion weights, thereby achieving comprehensive perception of feature channels and contextual information. Additionally, to mitigate the high computational complexity of processing multidimensional data, we introduce the mLSTM block, which employs linear operations to mine global contextual information in fused images, thus reducing computational complexity and enhancing image segmentation performance. Experiments on the WHU-OPT-SAR dataset show that SOLSTM has excellent performance, achieving up to 52.9 mIoU and outperforming single source image segmentation, verifying the effective fusion of OPT-SAR.