SWFormer:用于混合模态高光谱分类的随机窗口卷积变换器

Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du
{"title":"SWFormer:用于混合模态高光谱分类的随机窗口卷积变换器","authors":"Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du","doi":"10.1109/TIP.2024.3465038","DOIUrl":null,"url":null,"abstract":"Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5482-5495"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification\",\"authors\":\"Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du\",\"doi\":\"10.1109/TIP.2024.3465038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5482-5495\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10696913/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10696913/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

采用混合模式对高光谱图像进行联合分类可显著提高解译潜力,尤其是在集成了激光雷达传感器的高程信息后,效果更为突出。最近,变压器架构被引入到高光谱和激光雷达分类任务中,其高效性已得到验证。然而,现有的天真变换器架构存在两个主要缺点:1) 无法同时从 HSI 中提取局部空间信息和多尺度信息。2) 变换器中的矩阵计算消耗大量计算能力。本文提出了一种新颖的随机窗口变换器(SWFormer)框架来解决这些问题。首先,利用并行特征提取技术,在混合模态异构数据组成的基础上,独立构建有效的空间和频谱特征投影网络,有利于从不同维度挖掘出更具代表性的感知特征。此外,为了更灵活地构建局部-全局非线性特征图,我们采用了多尺度带状卷积和变换器策略。此外,在创新的随机窗口变换器结构中,特征被随机屏蔽,实现了稀疏窗口剪枝,缓解了信息密度冗余问题,减少了密集关注所需的参数。最后,我们设计了一个即插即用的特征聚合模块,它能自适应地调整模态特征之间的域偏移,最大限度地减少它们之间的语义差距,增强融合特征的表征能力。三个固定数据集证明了 SWFormer 在确定分类结果方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification
Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Learning Cross-Attention Point Transformer With Global Porous Sampling Salient Object Detection From Arbitrary Modalities GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning AnlightenDiff: Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1