Adaptive Pixel-Level and Superpixel-Level Feature Fusion Transformer for Hyperspectral Image Classification

IF 5.3 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2024-09-06 DOI:10.1109/JSTARS.2024.3455561

Wei Huang;Dazhan Zhou;Le Sun;Qiqiang Chen;Junru Yin

{"title":"Adaptive Pixel-Level and Superpixel-Level Feature Fusion Transformer for Hyperspectral Image Classification","authors":"Wei Huang;Dazhan Zhou;Le Sun;Qiqiang Chen;Junru Yin","doi":"10.1109/JSTARS.2024.3455561","DOIUrl":null,"url":null,"abstract":"Significant progress has been achieved in hyperspectral image (HSI) classification research through the application of the transformer blocks. Despite transformers possess strong long-range dependence modeling capabilities, they primarily extract nonlocal information from patches and often fail to fully capture global information, leading to incomplete spectral-spatial feature extraction. However, graph convolutional networks (GCNs) can effectively extract features from the global structure. This article proposes an adaptive pixel-level and superpixel-level feature fusion transformer (APSFFT). The network comprises two branches: one is the convolutional neural networks (CNNs) and transformer networks (CNTN), and the other is the GCNs and transformer networks (GNTN). These branches are designed to extract pixel-level and superpixel-level feature information from HSI, respectively. CNTN leverages the strengths of CNNs in extracting spectral–spatial information, combined with the transformer network's ability to establish long-range dependencies based on self-attention (SA). The GNTN fully extracts superpixel-level features while establishing long-range dependencies. To adaptively fuse the features from these two branches, an adaptive cross-token attention fusion (ACTAF) encoder is utilized. The ACTAF encoder fuses the classification token from both branches through SA, thereby enhancing the model's ability to capture interactions between pixel-level and superpixel-level features. We compared and analyzed seven advanced HSI classification algorithms, and experiments showed that APSFFT outperforms other state-of-the-art methods.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"16876-16889"},"PeriodicalIF":5.3000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669095","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669095/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Significant progress has been achieved in hyperspectral image (HSI) classification research through the application of the transformer blocks. Despite transformers possess strong long-range dependence modeling capabilities, they primarily extract nonlocal information from patches and often fail to fully capture global information, leading to incomplete spectral-spatial feature extraction. However, graph convolutional networks (GCNs) can effectively extract features from the global structure. This article proposes an adaptive pixel-level and superpixel-level feature fusion transformer (APSFFT). The network comprises two branches: one is the convolutional neural networks (CNNs) and transformer networks (CNTN), and the other is the GCNs and transformer networks (GNTN). These branches are designed to extract pixel-level and superpixel-level feature information from HSI, respectively. CNTN leverages the strengths of CNNs in extracting spectral–spatial information, combined with the transformer network's ability to establish long-range dependencies based on self-attention (SA). The GNTN fully extracts superpixel-level features while establishing long-range dependencies. To adaptively fuse the features from these two branches, an adaptive cross-token attention fusion (ACTAF) encoder is utilized. The ACTAF encoder fuses the classification token from both branches through SA, thereby enhancing the model's ability to capture interactions between pixel-level and superpixel-level features. We compared and analyzed seven advanced HSI classification algorithms, and experiments showed that APSFFT outperforms other state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于高光谱图像分类的自适应像素级和超像素级特征融合变换器

通过应用变换器模块，高光谱图像（HSI）分类研究取得了重大进展。尽管变换器具有很强的长距离依赖建模能力，但它们主要是从斑块中提取非局部信息，往往不能完全捕捉全局信息，导致光谱空间特征提取不完整。然而，图卷积网络（GCN）可以有效地从全局结构中提取特征。本文提出了一种自适应像素级和超像素级特征融合变换器（APSFFT）。该网络包括两个分支：一个是卷积神经网络（CNN）和变换器网络（CNTN），另一个是 GCN 和变换器网络（GNTN）。这些分支分别用于从 HSI 中提取像素级和超像素级特征信息。CNTN 充分利用了 CNN 在提取光谱空间信息方面的优势，并结合了变压器网络基于自我关注（SA）建立长距离依赖关系的能力。GNTN 在充分提取超像素级特征的同时，还建立了长距离依赖关系。为了自适应地融合来自这两个分支的特征，我们使用了自适应跨标记注意融合（ACTAF）编码器。ACTAF 编码器通过 SA 融合来自两个分支的分类标记，从而增强了模型捕捉像素级和超像素级特征之间交互的能力。我们对比分析了七种先进的 HSI 分类算法，实验结果表明 APSFFT 优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.