CTCFNet: CNN-Transformer Complementary and Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2024-09-11 DOI:10.1109/TGRS.2024.3458446
Chen Lu;Xian Zhang;Kaile Du;Han Xu;Guangcan Liu
{"title":"CTCFNet: CNN-Transformer Complementary and Fusion Network for High-Resolution Remote Sensing Image Semantic Segmentation","authors":"Chen Lu;Xian Zhang;Kaile Du;Han Xu;Guangcan Liu","doi":"10.1109/TGRS.2024.3458446","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of high-resolution remote sensing images poses challenges such as scale variability, diverse objects, and obstruction by surface elements. These factors often lead existing methods to suffer from issues like missed and false detections, as well as coarse segmentation boundaries. To tackle these challenges, this article proposes a CNN-transformer complementary and fusion network, termed as CTCFNet. It aims to enhance segmentation accuracy and robustness by extracting and integrating the complementary global and local information from high-resolution remote sensing images. The CTCFNet operates through two primary stages: feature extraction and fusion. In the feature extraction stage, a feature extractor employs convolutional neural network (CNN) and pyramid vision transformer (PVT) blocks to extract both local and global features. A boundary loss is also proposed to improve the segmentation performance for object textures and boundaries. In the feature fusion stage, a feature aggregation module (FAM) is first designed to effectively fuse local and global features at the same scale, facilitating the feature extractor to obtain more comprehensive representations. On this basis, a bi-directional decoder (BiDecoder) reconstructs multiscale features through both top-down and bottom-up directions, resulting in more precise segmentation outputs. Experiments on several high-resolution remote sensing image datasets demonstrate that the proposed method outperforms the state-of-the-art methods in terms of segmentation accuracy and generalization. The code is available at \n<uri>https://github.com/ChenLu0000/CTCFNet</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-17"},"PeriodicalIF":8.6000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10677460/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic segmentation of high-resolution remote sensing images poses challenges such as scale variability, diverse objects, and obstruction by surface elements. These factors often lead existing methods to suffer from issues like missed and false detections, as well as coarse segmentation boundaries. To tackle these challenges, this article proposes a CNN-transformer complementary and fusion network, termed as CTCFNet. It aims to enhance segmentation accuracy and robustness by extracting and integrating the complementary global and local information from high-resolution remote sensing images. The CTCFNet operates through two primary stages: feature extraction and fusion. In the feature extraction stage, a feature extractor employs convolutional neural network (CNN) and pyramid vision transformer (PVT) blocks to extract both local and global features. A boundary loss is also proposed to improve the segmentation performance for object textures and boundaries. In the feature fusion stage, a feature aggregation module (FAM) is first designed to effectively fuse local and global features at the same scale, facilitating the feature extractor to obtain more comprehensive representations. On this basis, a bi-directional decoder (BiDecoder) reconstructs multiscale features through both top-down and bottom-up directions, resulting in more precise segmentation outputs. Experiments on several high-resolution remote sensing image datasets demonstrate that the proposed method outperforms the state-of-the-art methods in terms of segmentation accuracy and generalization. The code is available at https://github.com/ChenLu0000/CTCFNet .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CTCFNet:用于高分辨率遥感图像语义分割的 CNN-变换器互补与融合网络
高分辨率遥感图像的语义分割面临着各种挑战,如尺度变化、物体种类繁多、表面元素遮挡等。这些因素往往导致现有方法出现漏检和误检以及分割边界粗糙等问题。为应对这些挑战,本文提出了一种 CNN-变换器互补与融合网络,称为 CTCFNet。它旨在通过提取和整合高分辨率遥感图像中互补的全局和局部信息,提高分割精度和鲁棒性。CTCFNet 的运行分为两个主要阶段:特征提取和融合。在特征提取阶段,特征提取器采用卷积神经网络(CNN)和金字塔视觉变换器(PVT)模块来提取局部和全局特征。此外,还提出了一种边界损失,以提高物体纹理和边界的分割性能。在特征融合阶段,首先设计了一个特征聚合模块(FAM),以在同一尺度上有效地融合局部和全局特征,便于特征提取器获得更全面的表征。在此基础上,双向解码器(BiDecoder)通过自上而下和自下而上两个方向重建多尺度特征,从而获得更精确的分割输出。在几个高分辨率遥感图像数据集上的实验表明,所提出的方法在分割精度和泛化方面优于最先进的方法。代码见 https://github.com/ChenLu0000/CTCFNet。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
A Hierarchical Vision-Language Model-Guided Feature Fusion Framework for Referring Remote Sensing Image Segmentation Efficient One-Step Orthogonal Consensus Framework for Multi-View Remote Sensing Clustering Temporally-Similar Structure-Aware Spatiotemporal Fusion of Satellite Images WCDMF-Net: Wavelet-based Cross-Domain Multistage Feature Fusion Network for Infrared Small Target Detection Satellite Video Continuous Space-Time Super-Resolution via Mask-Based Temporal-Aware Warping and Cross-Level Frequency Integration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1