CTCFNet: CNN-Transformer Complementary and Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation

IF 8.6 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2024-09-11 DOI:10.1109/TGRS.2024.3458446

Chen Lu;Xian Zhang;Kaile Du;Han Xu;Guangcan Liu

{"title":"CTCFNet: CNN-Transformer Complementary and Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation","authors":"Chen Lu;Xian Zhang;Kaile Du;Han Xu;Guangcan Liu","doi":"10.1109/TGRS.2024.3458446","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of high-resolution remote sensing images poses challenges such as scale variability, diverse objects, and obstruction by surface elements. These factors often lead existing methods to suffer from issues like missed and false detections, as well as coarse segmentation boundaries. To tackle these challenges, this article proposes a CNN-transformer complementary and fusion network, termed as CTCFNet. It aims to enhance segmentation accuracy and robustness by extracting and integrating the complementary global and local information from high-resolution remote sensing images. The CTCFNet operates through two primary stages: feature extraction and fusion. In the feature extraction stage, a feature extractor employs convolutional neural network (CNN) and pyramid vision transformer (PVT) blocks to extract both local and global features. A boundary loss is also proposed to improve the segmentation performance for object textures and boundaries. In the feature fusion stage, a feature aggregation module (FAM) is first designed to effectively fuse local and global features at the same scale, facilitating the feature extractor to obtain more comprehensive representations. On this basis, a bi-directional decoder (BiDecoder) reconstructs multiscale features through both top-down and bottom-up directions, resulting in more precise segmentation outputs. Experiments on several high-resolution remote sensing image datasets demonstrate that the proposed method outperforms the state-of-the-art methods in terms of segmentation accuracy and generalization. The code is available at \n<uri>https://github.com/ChenLu0000/CTCFNet</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-17"},"PeriodicalIF":8.6000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10677460/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation of high-resolution remote sensing images poses challenges such as scale variability, diverse objects, and obstruction by surface elements. These factors often lead existing methods to suffer from issues like missed and false detections, as well as coarse segmentation boundaries. To tackle these challenges, this article proposes a CNN-transformer complementary and fusion network, termed as CTCFNet. It aims to enhance segmentation accuracy and robustness by extracting and integrating the complementary global and local information from high-resolution remote sensing images. The CTCFNet operates through two primary stages: feature extraction and fusion. In the feature extraction stage, a feature extractor employs convolutional neural network (CNN) and pyramid vision transformer (PVT) blocks to extract both local and global features. A boundary loss is also proposed to improve the segmentation performance for object textures and boundaries. In the feature fusion stage, a feature aggregation module (FAM) is first designed to effectively fuse local and global features at the same scale, facilitating the feature extractor to obtain more comprehensive representations. On this basis, a bi-directional decoder (BiDecoder) reconstructs multiscale features through both top-down and bottom-up directions, resulting in more precise segmentation outputs. Experiments on several high-resolution remote sensing image datasets demonstrate that the proposed method outperforms the state-of-the-art methods in terms of segmentation accuracy and generalization. The code is available at https://github.com/ChenLu0000/CTCFNet .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CTCFNet：用于高分辨率遥感图像语义分割的 CNN-变换器互补与融合网络

高分辨率遥感图像的语义分割面临着各种挑战，如尺度变化、物体种类繁多、表面元素遮挡等。这些因素往往导致现有方法出现漏检和误检以及分割边界粗糙等问题。为应对这些挑战，本文提出了一种 CNN-变换器互补与融合网络，称为 CTCFNet。它旨在通过提取和整合高分辨率遥感图像中互补的全局和局部信息，提高分割精度和鲁棒性。CTCFNet 的运行分为两个主要阶段：特征提取和融合。在特征提取阶段，特征提取器采用卷积神经网络（CNN）和金字塔视觉变换器（PVT）模块来提取局部和全局特征。此外，还提出了一种边界损失，以提高物体纹理和边界的分割性能。在特征融合阶段，首先设计了一个特征聚合模块（FAM），以在同一尺度上有效地融合局部和全局特征，便于特征提取器获得更全面的表征。在此基础上，双向解码器（BiDecoder）通过自上而下和自下而上两个方向重建多尺度特征，从而获得更精确的分割输出。在几个高分辨率遥感图像数据集上的实验表明，所提出的方法在分割精度和泛化方面优于最先进的方法。代码见 https://github.com/ChenLu0000/CTCFNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.