Pub Date : 2024-11-04DOI: 10.1109/JSTARS.2024.3490584
Zhenhao Yang;Fukun Bi;Xinghai Hou;Dehao Zhou;Yanping Wang
Semantic segmentation is crucial for interpreting remote sensing images. The segmentation performance has been significantly improved recently with the development of deep learning. However, complex background samples and small objects greatly increase the challenge of the semantic segmentation task for remote sensing images. To address these challenges, we propose a dual-domain refinement network (DDRNet) for accurate segmentation. Specifically, we first propose a spatial and frequency feature reconstruction module, which separately utilizes the characteristics of the frequency and spatial domains to refine the global salient features and the fine-grained spatial features of objects. This process enhances the foreground saliency and adaptively suppresses background noise. Subsequently, we propose a feature alignment module that selectively couples the features refined from both domains via cross-attention, achieving semantic alignment between frequency and spatial domains. In addition, a meticulously designed detail-aware attention module is introduced to compensate for the loss of small objects during feature propagation. This module leverages cross-correlation matrices between high-level features and the original image to quantify the similarities among objects belonging to the same category, thereby transmitting rich semantic information from high-level features to small objects. The results on multiple datasets validate that our method outperforms the existing methods and achieves a good compromise between computational overhead and accuracy.
{"title":"DDRNet: Dual-Domain Refinement Network for Remote Sensing Image Semantic Segmentation","authors":"Zhenhao Yang;Fukun Bi;Xinghai Hou;Dehao Zhou;Yanping Wang","doi":"10.1109/JSTARS.2024.3490584","DOIUrl":"https://doi.org/10.1109/JSTARS.2024.3490584","url":null,"abstract":"Semantic segmentation is crucial for interpreting remote sensing images. The segmentation performance has been significantly improved recently with the development of deep learning. However, complex background samples and small objects greatly increase the challenge of the semantic segmentation task for remote sensing images. To address these challenges, we propose a dual-domain refinement network (DDRNet) for accurate segmentation. Specifically, we first propose a spatial and frequency feature reconstruction module, which separately utilizes the characteristics of the frequency and spatial domains to refine the global salient features and the fine-grained spatial features of objects. This process enhances the foreground saliency and adaptively suppresses background noise. Subsequently, we propose a feature alignment module that selectively couples the features refined from both domains via cross-attention, achieving semantic alignment between frequency and spatial domains. In addition, a meticulously designed detail-aware attention module is introduced to compensate for the loss of small objects during feature propagation. This module leverages cross-correlation matrices between high-level features and the original image to quantify the similarities among objects belonging to the same category, thereby transmitting rich semantic information from high-level features to small objects. The results on multiple datasets validate that our method outperforms the existing methods and achieves a good compromise between computational overhead and accuracy.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"20177-20189"},"PeriodicalIF":4.7,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10741324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1109/JSTARS.2024.3491216
Xinran Li;Tao Chen;Gang Liu;Jie Dou;Ruiqing Niu;Antonio Plaza
Landslide recognition (LR) is a fundamental task for disaster prevention and control. Convolutional neural networks (CNNs) and transformer architectures have been widely used for extracting landslide information. However, CNNs cannot accurately characterize long-distance dependencies and global information, while the transformer may not be as effective as CNNs in capturing local features and spatial information. To address these limitations, we construct a new LR network based on grid-based attention and multilevel feature fusion (GAMTNet). We complement CNNs by adding a transformer-based structure in a layer-by-layer fashion and improving methods for sequence generation and attention weight calculation. As a result, GAMTNet effectively learns global and local information about landslides across various spatial scales. We evaluated our model using landslide data collected from the southwest region of Jiuzhaigou County, Aba Tibetan, and Qiang Autonomous Prefecture, Sichuan Province, China. The results demonstrate that the proposed GAMTNet model achieves an F