FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+ for Remote Sensing Semantic Segmentation

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-03-21 DOI:10.1109/TGRS.2025.3553478
Haixia Feng;Qingwu Hu;Pengcheng Zhao;Shunli Wang;Mingyao Ai;Daoyuan Zheng;Tiancheng Liu
{"title":"FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+ for Remote Sensing Semantic Segmentation","authors":"Haixia Feng;Qingwu Hu;Pengcheng Zhao;Shunli Wang;Mingyao Ai;Daoyuan Zheng;Tiancheng Liu","doi":"10.1109/TGRS.2025.3553478","DOIUrl":null,"url":null,"abstract":"High-resolution remote sensing images contain rich color and texture information, but due to the inherent limitations of 2-D data, achieving high-quality semantic segmentation remains a challenge. Multimodal data fusion technology has emerged as an effective approach to overcome this issue. To accurately capture the semantic information in remote sensing images, this study designs a multimodal fusion Transformer-based DeepLabv3+ model for remote sensing semantic segmentation, named FTransDeepLab. Specifically, the network learns features from two modalities and is inspired by the DeepLab architecture. We extended the encoder by stacking the multiscale Segformer, encoding the input images into highly representative spatial features. Additionally, we introduced the multimodal feature rectification (MFR) module and the multimodal feature fusion (MFF) module. The MFR, composed of a channel attention module and a spatial attention module, enhances the model’s ability to capture essential features and improves performance by focusing on both global and local contexts. The MFF module utilizes a cross-attention mechanism to optimize the feature fusion process, which enhances representation learning by facilitating the interaction between diverse information and integrates features from different modalities. Finally, in the decoding path, the extracted high-level features are concatenated with low-level features to optimize the feature representation and upsampled to restore the size of input image. Extensive results on two datasets, the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam, have confirmed that the proposed FTransDeepLab can achieve superior performance compared to the state-of-the-art segmentation methods.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-18"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937095/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

High-resolution remote sensing images contain rich color and texture information, but due to the inherent limitations of 2-D data, achieving high-quality semantic segmentation remains a challenge. Multimodal data fusion technology has emerged as an effective approach to overcome this issue. To accurately capture the semantic information in remote sensing images, this study designs a multimodal fusion Transformer-based DeepLabv3+ model for remote sensing semantic segmentation, named FTransDeepLab. Specifically, the network learns features from two modalities and is inspired by the DeepLab architecture. We extended the encoder by stacking the multiscale Segformer, encoding the input images into highly representative spatial features. Additionally, we introduced the multimodal feature rectification (MFR) module and the multimodal feature fusion (MFF) module. The MFR, composed of a channel attention module and a spatial attention module, enhances the model’s ability to capture essential features and improves performance by focusing on both global and local contexts. The MFF module utilizes a cross-attention mechanism to optimize the feature fusion process, which enhances representation learning by facilitating the interaction between diverse information and integrates features from different modalities. Finally, in the decoding path, the extracted high-level features are concatenated with low-level features to optimize the feature representation and upsampled to restore the size of input image. Extensive results on two datasets, the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam, have confirmed that the proposed FTransDeepLab can achieve superior performance compared to the state-of-the-art segmentation methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FTransDeepLab:基于多模态融合变换器的 DeepLabv3+ 用于遥感语义分割
高分辨率遥感图像包含丰富的颜色和纹理信息,但由于二维数据的固有局限性,实现高质量的语义分割仍然是一个挑战。多模态数据融合技术已成为解决这一问题的有效途径。为了准确捕获遥感图像中的语义信息,本研究设计了一种基于多模态融合transformer的遥感语义分割DeepLabv3+模型,命名为FTransDeepLab。具体来说,该网络从两种模式中学习特征,并受到DeepLab架构的启发。我们通过堆叠多尺度分割器扩展编码器,将输入图像编码为具有高度代表性的空间特征。此外,我们还引入了多模态特征校正(MFR)模块和多模态特征融合(MFF)模块。MFR由通道注意模块和空间注意模块组成,增强了模型捕捉基本特征的能力,并通过关注全局和局部上下文提高了模型的性能。MFF模块利用交叉注意机制优化特征融合过程,通过促进不同信息之间的交互来增强表征学习,并集成来自不同模态的特征。最后,在解码路径中,将提取的高级特征与低级特征进行拼接,优化特征表示并上采样,恢复输入图像的大小。在国际摄影测量与遥感学会(ISPRS) Vaihingen和Potsdam两个数据集上的广泛结果证实,与最先进的分割方法相比,拟议的FTransDeepLab可以实现卓越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
An Improved Mahalanobis Distance Method for Smoke Detection Based on Fine-Grained Background Identification An Automatic Layer Extraction Algorithm for Ice Sounding Radar Data Based on Curvelet Transform (CT) and Minimum Spanning Tree (MST) Estimations of wind direction and CO 2 emissions from power plants with DaQi-1 Satellite Adaptive Modeling for Air Quality: A Continual Learning Framework for PM 2.5 Estimation in Vietnam LKA-GFNet: Language Knowledge-Augmented Graph Fusion for Tri-Source Heterogeneous Remote Sensing Data Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1