{"title":"RoSENet: Rotation and Similarity Enhancement Network for Multimodal Remote Sensing Image Land Cover Classification","authors":"Bokun Ma;Caihong Mu;Yi Liu;Xinyu He;Mosa Haidarh","doi":"10.1109/TGRS.2025.3561850","DOIUrl":null,"url":null,"abstract":"Multimodal classification methods have been widely applied in remote sensing (RS) land cover (LC) classification tasks. However, the existing multimodal classification methods often face challenges such as insufficient information extraction, sensitivity to noise, and inadequate utilization of complex spectral-spatial features during data fusion. To address these issues, this article proposes a rotation and similarity enhancement network (RoSENet) for multimodal RS LC classification by using hyperspectral image (HSI) data together with light detection and ranging (LiDAR) data. RoSENet consists of three key modules. First, the rotation fusion enhancement (RFE) module significantly increases the diversity of input HSI and LiDAR data and improves the model’s generalization ability through multiangle rotation and spectral direction concatenation. Second, the spectral adaptive self-similarity convolution (SASSC) module ensures comprehensive preservation and fusion of spectral and spatial features through adaptive similarity measurement and convolution operations, enhancing the recognition of different classes in complex scenes. Third, the Euclidean similarity spatial-channel attention (ESSCA) module effectively strengthens global feature representation by capturing the resemblance between the central spectral vector and those situated in the surrounding area, improving the model’s robustness in noisy environments. Extensive experiments are carried out on three public datasets, and the experimental results reveal that RoSENet demonstrates significant advantages in terms of overall accuracy (OA), average accuracy (AA), and the Kappa coefficient. Compared to traditional single-modal models, including convolutional neural networks (CNNs), and state-of-the-art single-modal and multimodal transformer models, RoSENet better captures detailed features in classification tasks and effectively reduces the impact of noise on classification accuracy.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-18"},"PeriodicalIF":8.6000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10967546/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal classification methods have been widely applied in remote sensing (RS) land cover (LC) classification tasks. However, the existing multimodal classification methods often face challenges such as insufficient information extraction, sensitivity to noise, and inadequate utilization of complex spectral-spatial features during data fusion. To address these issues, this article proposes a rotation and similarity enhancement network (RoSENet) for multimodal RS LC classification by using hyperspectral image (HSI) data together with light detection and ranging (LiDAR) data. RoSENet consists of three key modules. First, the rotation fusion enhancement (RFE) module significantly increases the diversity of input HSI and LiDAR data and improves the model’s generalization ability through multiangle rotation and spectral direction concatenation. Second, the spectral adaptive self-similarity convolution (SASSC) module ensures comprehensive preservation and fusion of spectral and spatial features through adaptive similarity measurement and convolution operations, enhancing the recognition of different classes in complex scenes. Third, the Euclidean similarity spatial-channel attention (ESSCA) module effectively strengthens global feature representation by capturing the resemblance between the central spectral vector and those situated in the surrounding area, improving the model’s robustness in noisy environments. Extensive experiments are carried out on three public datasets, and the experimental results reveal that RoSENet demonstrates significant advantages in terms of overall accuracy (OA), average accuracy (AA), and the Kappa coefficient. Compared to traditional single-modal models, including convolutional neural networks (CNNs), and state-of-the-art single-modal and multimodal transformer models, RoSENet better captures detailed features in classification tasks and effectively reduces the impact of noise on classification accuracy.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.