UM2Former: U-Shaped Multimixed Transformer Network for Large-Scale Hyperspectral Image Semantic Segmentation

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-02-19 DOI:10.1109/TGRS.2025.3543821
Aijun Xu;Zhaohui Xue;Ziyu Li;Shun Cheng;Hongjun Su;Junshi Xia
{"title":"UM2Former: U-Shaped Multimixed Transformer Network for Large-Scale Hyperspectral Image Semantic Segmentation","authors":"Aijun Xu;Zhaohui Xue;Ziyu Li;Shun Cheng;Hongjun Su;Junshi Xia","doi":"10.1109/TGRS.2025.3543821","DOIUrl":null,"url":null,"abstract":"Transformer-based deep learning (DL) methods have gradually been advocated for remote sensing (RS) image semantic segmentation due to the great global modeling capability. Nevertheless, Transformer-based DL methods have not yet been sufficiently explored on the large-scale hyperspectral image (HSI) semantic segmentation. Current algorithms lack a comprehensive consideration of the impact of positional encoding (PE) interpolation when constructing Transformer-based decoders. Moreover, existing segmentation heads usually directly concatenate multiscale features to achieve segmentation, which ignores the inherent semantic differences between different features. To address the above issues, a U-shaped multimixed Transformer network (UM2Former) is proposed for large-scale HSI semantic segmentation. First, a weight encoder consisting of two modules, the overlap-down and the channel-weight, is built to extract hierarchical discriminative spectral-spatial features and decrease spectral redundancy. Second, the proposed multimixed Transformer block (MMTB) develops a PE-free module, spatial-feature-retention attention (SFRA) mechanism, in which “multimixed” represents the global dependency modeling of each pixel with the retented average spatial characteristics of different locations in the input feature maps. Finally, a linear fuse segmentation head (LFSH) is designed to align semantic information among multiscale feature maps and achieve accurate segmentation. Experiments were conducted in single cities and the entire large-scale WHU-OHS HSI dataset. The segmentation results indicated that the proposed method achieved higher accuracy compared to the existing semantic segmentation methods, with performance improvements of 17.80% and 4.16% in terms of intersection over union (mIoU) and overall accuracy (OA), respectively. The source code will be available at <uri>https://github.com/ZhaohuiXue/</uri> UM2Former.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-21"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10892222/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Transformer-based deep learning (DL) methods have gradually been advocated for remote sensing (RS) image semantic segmentation due to the great global modeling capability. Nevertheless, Transformer-based DL methods have not yet been sufficiently explored on the large-scale hyperspectral image (HSI) semantic segmentation. Current algorithms lack a comprehensive consideration of the impact of positional encoding (PE) interpolation when constructing Transformer-based decoders. Moreover, existing segmentation heads usually directly concatenate multiscale features to achieve segmentation, which ignores the inherent semantic differences between different features. To address the above issues, a U-shaped multimixed Transformer network (UM2Former) is proposed for large-scale HSI semantic segmentation. First, a weight encoder consisting of two modules, the overlap-down and the channel-weight, is built to extract hierarchical discriminative spectral-spatial features and decrease spectral redundancy. Second, the proposed multimixed Transformer block (MMTB) develops a PE-free module, spatial-feature-retention attention (SFRA) mechanism, in which “multimixed” represents the global dependency modeling of each pixel with the retented average spatial characteristics of different locations in the input feature maps. Finally, a linear fuse segmentation head (LFSH) is designed to align semantic information among multiscale feature maps and achieve accurate segmentation. Experiments were conducted in single cities and the entire large-scale WHU-OHS HSI dataset. The segmentation results indicated that the proposed method achieved higher accuracy compared to the existing semantic segmentation methods, with performance improvements of 17.80% and 4.16% in terms of intersection over union (mIoU) and overall accuracy (OA), respectively. The source code will be available at https://github.com/ZhaohuiXue/ UM2Former.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UM2Former:用于大规模高光谱图像语义分割的u形多混合变压器网络
基于变压器的深度学习(DL)方法由于具有良好的全局建模能力,在遥感图像语义分割中逐渐被提倡。然而,基于transformer的深度学习方法在大规模高光谱图像(HSI)语义分割上还没有得到充分的探索。目前的算法在构建基于变压器的解码器时缺乏对位置编码(PE)插值影响的全面考虑。此外,现有的分割头通常直接拼接多尺度特征来实现分割,忽略了不同特征之间固有的语义差异。为了解决上述问题,提出了一种用于大规模HSI语义分割的u形多混合变压器网络(UM2Former)。首先,构建了一个权重编码器,该编码器由重叠和信道权重两个模块组成,用于提取分层区分的频谱空间特征,降低频谱冗余;其次,提出的多混合变压器块(MMTB)开发了一种无pe模块的空间特征保留注意(SFRA)机制,其中“多混合”表示输入特征映射中每个像素与保留的不同位置的平均空间特征的全局依赖建模。最后,设计了线性融合分割头(LFSH),对多尺度特征图之间的语义信息进行对齐,实现精确分割。实验分别在单个城市和整个大规模WHU-OHS HSI数据集上进行。分割结果表明,与现有的语义分割方法相比,该方法的分割精度提高了17.80%,在相交比联合(mIoU)和总体精度(OA)方面分别提高了4.16%。源代码可在https://github.com/ZhaohuiXue/ UM2Former上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
Fine-Scale Structure Reconstruction of Weather Radar Echoes via Blind Super-Resolution Generalized Iterative Sparse Maximum Likelihood Algorithm for the Detection of Buried Targets Unsupervised Snowy-Weather Point Cloud Denoising via Two-Stage Filter-Network Collaboration Numerical Study on Anisotropic Permeability Inversion from Dipole Seismoelectric Logging in Fluid-saturated Porous Formations Hybrid F-K Filtering and Deep Learning for P/S Separation in DAS VSP Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1