SWFormer：用于混合模态高光谱分类的随机窗口卷积变换器

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-09-26 DOI:10.1109/TIP.2024.3465038

Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du

{"title":"SWFormer：用于混合模态高光谱分类的随机窗口卷积变换器","authors":"Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du","doi":"10.1109/TIP.2024.3465038","DOIUrl":null,"url":null,"abstract":"Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5482-5495"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification\",\"authors\":\"Jiaojiao Li;Zhiyuan Zhang;Yuzhe Liu;Rui Song;Yunsong Li;Qian Du\",\"doi\":\"10.1109/TIP.2024.3465038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5482-5495\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10696913/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10696913/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

采用混合模式对高光谱图像进行联合分类可显著提高解译潜力，尤其是在集成了激光雷达传感器的高程信息后，效果更为突出。最近，变压器架构被引入到高光谱和激光雷达分类任务中，其高效性已得到验证。然而，现有的天真变换器架构存在两个主要缺点：1) 无法同时从 HSI 中提取局部空间信息和多尺度信息。2) 变换器中的矩阵计算消耗大量计算能力。本文提出了一种新颖的随机窗口变换器（SWFormer）框架来解决这些问题。首先，利用并行特征提取技术，在混合模态异构数据组成的基础上，独立构建有效的空间和频谱特征投影网络，有利于从不同维度挖掘出更具代表性的感知特征。此外，为了更灵活地构建局部-全局非线性特征图，我们采用了多尺度带状卷积和变换器策略。此外，在创新的随机窗口变换器结构中，特征被随机屏蔽，实现了稀疏窗口剪枝，缓解了信息密度冗余问题，减少了密集关注所需的参数。最后，我们设计了一个即插即用的特征聚合模块，它能自适应地调整模态特征之间的域偏移，最大限度地减少它们之间的语义差距，增强融合特征的表征能力。三个固定数据集证明了 SWFormer 在确定分类结果方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SWFormer: Stochastic Windows Convolutional Transformer for Hybrid Modality Hyperspectral Classification

Joint classification of hyperspectral images with hybrid modality can significantly enhance interpretation potentials, particularly when elevation information from the LiDAR sensor is integrated for outstanding performance. Recently, the transformer architecture was introduced to the HSI and LiDAR classification task, which has been verified as highly efficient. However, the existing naive transformer architectures suffer from two main drawbacks: 1) Inadequacy extraction for local spatial information and multi-scale information from HSI simultaneously. 2) The matrix calculation in the transformer consumes vast amounts of computing power. In this paper, we propose a novel Stochastic Window Transformer (SWFormer) framework to resolve these issues. First, the effective spatial and spectral feature projection networks are built independently based on hybrid-modal heterogeneous data composition using parallel feature extraction, which is conducive to excavating the perceptual features more representative along different dimensions. Furthermore, to construct local-global nonlinear feature maps more flexibly, we implement multi-scale strip convolution coupled with a transformer strategy. Moreover, in an innovative random window transformer structure, features are randomly masked to achieve sparse window pruning, alleviating the problem of information density redundancy, and reducing the parameters required for intensive attention. Finally, we designed a plug-and-play feature aggregation module that adapts domain offset between modal features adaptively to minimize semantic gaps between them and enhance the representational ability of the fusion feature. Three fiducial datasets demonstrate the effectiveness of the SWFormer in determining classification results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量

期刊最新文献

Learning Cross-Attention Point Transformer With Global Porous Sampling Salient Object Detection From Arbitrary Modalities GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning AnlightenDiff: Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection