Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral image classification

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-10-21 DOI:10.1016/j.neucom.2024.128751
Weilian Zhou , Sei-ichiro Kamata , Haipeng Wang , Man Sing Wong , Huiying (Cynthia) Hou
{"title":"Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral image classification","authors":"Weilian Zhou ,&nbsp;Sei-ichiro Kamata ,&nbsp;Haipeng Wang ,&nbsp;Man Sing Wong ,&nbsp;Huiying (Cynthia) Hou","doi":"10.1016/j.neucom.2024.128751","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral image (HSI) classification plays a crucial role in remote sensing (RS) applications, enabling the precise identification of materials and land cover based on spectral information. This supports tasks such as agricultural management and urban planning. While sequential neural models like Recurrent Neural Networks (RNNs) and Transformers have been adapted for this task, they present limitations: RNNs struggle with feature aggregation and are sensitive to noise from interfering pixels, whereas Transformers require extensive computational resources and tend to underperform when HSI datasets contain limited or unbalanced training samples. To address these challenges, Mamba architectures have emerged, offering a balance between RNNs and Transformers by leveraging lightweight, parallel scanning capabilities. Although models like Vision Mamba (ViM) and Visual Mamba (VMamba) have demonstrated improvements in visual tasks, their application to HSI classification remains underexplored, particularly in handling land-cover semantic tokens and multi-scale feature aggregation for patch-wise classifiers. In response, this study introduces the Mamba-in-Mamba (MiM) architecture for HSI classification, marking a pioneering effort in this domain. The MiM model features: (1) a novel centralized Mamba-Cross-Scan (MCS) mechanism for efficient image-to-sequence data transformation; (2) a Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), Semantic Token Learner (STL), and Semantic Token Fuser (STF) for enhanced feature generation; and (3) a Weighted MCS Fusion (WMF) module with a Multi-Scale Loss Design for improved training efficiency. Experimental results on four public HSI datasets—Indian Pines, Pavia University, Houston2013, and WHU-Hi-Honghu—demonstrate that our method achieves an overall accuracy improvement of up to 3.3%, 2.7%, 1.5%, and 2.3% over state-of-the-art approaches (i.e., SSFTT, MAEST, etc.) under both fixed and disjoint training-testing settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128751"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015224","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Hyperspectral image (HSI) classification plays a crucial role in remote sensing (RS) applications, enabling the precise identification of materials and land cover based on spectral information. This supports tasks such as agricultural management and urban planning. While sequential neural models like Recurrent Neural Networks (RNNs) and Transformers have been adapted for this task, they present limitations: RNNs struggle with feature aggregation and are sensitive to noise from interfering pixels, whereas Transformers require extensive computational resources and tend to underperform when HSI datasets contain limited or unbalanced training samples. To address these challenges, Mamba architectures have emerged, offering a balance between RNNs and Transformers by leveraging lightweight, parallel scanning capabilities. Although models like Vision Mamba (ViM) and Visual Mamba (VMamba) have demonstrated improvements in visual tasks, their application to HSI classification remains underexplored, particularly in handling land-cover semantic tokens and multi-scale feature aggregation for patch-wise classifiers. In response, this study introduces the Mamba-in-Mamba (MiM) architecture for HSI classification, marking a pioneering effort in this domain. The MiM model features: (1) a novel centralized Mamba-Cross-Scan (MCS) mechanism for efficient image-to-sequence data transformation; (2) a Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), Semantic Token Learner (STL), and Semantic Token Fuser (STF) for enhanced feature generation; and (3) a Weighted MCS Fusion (WMF) module with a Multi-Scale Loss Design for improved training efficiency. Experimental results on four public HSI datasets—Indian Pines, Pavia University, Houston2013, and WHU-Hi-Honghu—demonstrate that our method achieves an overall accuracy improvement of up to 3.3%, 2.7%, 1.5%, and 2.3% over state-of-the-art approaches (i.e., SSFTT, MAEST, etc.) under both fixed and disjoint training-testing settings.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
曼巴中的曼巴:用于高光谱图像分类的标记化曼巴模型中的集中式曼巴交叉扫描
高光谱图像(HSI)分类在遥感(RS)应用中发挥着至关重要的作用,可根据光谱信息精确识别物质和土地覆盖。这为农业管理和城市规划等任务提供了支持。虽然递归神经网络(RNN)和变形器等序列神经模型已被应用于这项任务,但它们仍存在局限性:RNN 在特征聚合方面很吃力,而且对干扰像素的噪声很敏感,而 Transformers 则需要大量的计算资源,而且在人机交互数据集包含有限或不平衡的训练样本时往往表现不佳。为了应对这些挑战,Mamba 架构应运而生,它利用轻量级并行扫描功能,在 RNN 和 Transformers 之间取得了平衡。虽然视觉 Mamba(ViM)和视觉 Mamba(VMamba)等模型在视觉任务中取得了改进,但它们在人机交互分类中的应用仍未得到充分探索,特别是在处理土地覆盖语义标记和用于片断分类器的多尺度特征聚合方面。为此,本研究引入了用于人机交互分类的 Mamba-in-Mamba (MiM) 架构,这标志着该领域的一项开创性工作。MiM 模型的特点是(1) 新型集中式 Mamba-Cross-Scan (MCS) 机制,可实现高效的图像到序列数据转换;(2) 标记化 Mamba (T-Mamba) 编码器,包含高斯衰减掩码 (GDM)、语义标记学习器 (STL) 和语义标记融合器 (STF),可增强特征生成;以及 (3) 加权 MCS 融合 (WMF) 模块,采用多尺度损失设计,可提高训练效率。在印度松树、帕维亚大学、Houston2013 和 WHU-Hi-Honghu 四个公共人机交互数据集上的实验结果表明,在固定和不连续的训练-测试设置下,我们的方法比最先进的方法(即 SSFTT、MAEST 等)分别提高了 3.3%、2.7%、1.5% 和 2.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Editorial Board Virtual sample generation for small sample learning: A survey, recent developments and future prospects Adaptive selection of spectral–spatial features for hyperspectral image classification using a modified-CBAM-based network FPGA-based component-wise LSTM training accelerator for neural granger causality analysis Multi-sensor information fusion in Internet of Vehicles based on deep learning: A review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1