Contour-Enhanced Visual State-Space Model for Remote Sensing Image Classification

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2024-12-20 DOI:10.1109/TGRS.2024.3520635
Liyue Yan;Xing Zhang;Kafeng Wang;Dejin Zhang
{"title":"Contour-Enhanced Visual State-Space Model for Remote Sensing Image Classification","authors":"Liyue Yan;Xing Zhang;Kafeng Wang;Dejin Zhang","doi":"10.1109/TGRS.2024.3520635","DOIUrl":null,"url":null,"abstract":"The accurate classification of remote sensing (RS) images can quickly identify various geographical features, which is important for planning, utilizing, and protecting natural resources. Recently, the visual Mamba model, as an extension of the vision transformer (ViT), is attracting widespread attention due to its global receptive field and linear complexity. However, the self-attention mechanism of visual transformers can lead to feature collapse in the deep layers, resulting in the disappearance of low-level visual features. In RS images, low-level features, and especially luminance gradient features, can help discern object boundaries and contour information. This is beneficial for the accurate classification of images but has not been fully leveraged. To make full use of contour information and explore the impact of using handcrafted low-level features on the deep layers of the model, in this study, a contour-enhanced Mamba model based on Vision Mamba (VMamba), is proposed, named G-VMamba. The core novelty of G-VMamba lies in its contour enhancement module (ConEM). First, two separate paths are used to extract adaptive luminance gradients and multidimensional convolutional features at each network layer. Subsequently, the features are combined to impose the constraints of low-level features onto the deeper networks. RS image classification experiments were conducted to evaluate the model’s performance, and the results demonstrate the superior performance of G-VMamba in classification tasks. An analysis of class activation maps (CAMs) across different categories shows that G-VMamba focuses more on color (or luminance) change significantly regions in images than models like VMamba, highlighting the efficacy of contour enhancement. The code will be available at: \n<uri>https://github.com/yanliyue/Contour-enhanced-Visual-State-Space-Model</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10810482/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate classification of remote sensing (RS) images can quickly identify various geographical features, which is important for planning, utilizing, and protecting natural resources. Recently, the visual Mamba model, as an extension of the vision transformer (ViT), is attracting widespread attention due to its global receptive field and linear complexity. However, the self-attention mechanism of visual transformers can lead to feature collapse in the deep layers, resulting in the disappearance of low-level visual features. In RS images, low-level features, and especially luminance gradient features, can help discern object boundaries and contour information. This is beneficial for the accurate classification of images but has not been fully leveraged. To make full use of contour information and explore the impact of using handcrafted low-level features on the deep layers of the model, in this study, a contour-enhanced Mamba model based on Vision Mamba (VMamba), is proposed, named G-VMamba. The core novelty of G-VMamba lies in its contour enhancement module (ConEM). First, two separate paths are used to extract adaptive luminance gradients and multidimensional convolutional features at each network layer. Subsequently, the features are combined to impose the constraints of low-level features onto the deeper networks. RS image classification experiments were conducted to evaluate the model’s performance, and the results demonstrate the superior performance of G-VMamba in classification tasks. An analysis of class activation maps (CAMs) across different categories shows that G-VMamba focuses more on color (or luminance) change significantly regions in images than models like VMamba, highlighting the efficacy of contour enhancement. The code will be available at: https://github.com/yanliyue/Contour-enhanced-Visual-State-Space-Model .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向遥感图像分类的轮廓增强视觉状态空间模型
遥感影像的准确分类可以快速识别各种地理特征,对自然资源的规划、利用和保护具有重要意义。近年来,视觉曼巴模型作为视觉变压器(vision transformer, ViT)的延伸,因其具有全局接受域和线性复杂性而受到广泛关注。然而,视觉变形者的自注意机制会导致深层特征崩塌,导致底层视觉特征消失。在RS图像中,底层特征,尤其是亮度梯度特征,可以帮助识别物体的边界和轮廓信息。这有利于图像的准确分类,但尚未得到充分利用。为了充分利用轮廓信息,探索使用手工制作的底层特征对模型深层的影响,本研究提出了一种基于视觉曼巴的轮廓增强曼巴模型,命名为G-VMamba。g - vamba的核心新颖之处在于其轮廓增强模块(ConEM)。首先,使用两条独立的路径在每个网络层提取自适应亮度梯度和多维卷积特征。随后,将这些特征组合起来,将低级特征的约束施加到更深层的网络上。通过RS图像分类实验对模型的性能进行了评价,结果证明了g - vamba在分类任务中的优异性能。对不同类别的类激活图(CAMs)的分析表明,g - vamba比vamba模型更关注图像中颜色(或亮度)变化显著的区域,突出了轮廓增强的功效。代码可在https://github.com/yanliyue/Contour-enhanced-Visual-State-Space-Model上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
An Improved Mahalanobis Distance Method for Smoke Detection Based on Fine-Grained Background Identification An Automatic Layer Extraction Algorithm for Ice Sounding Radar Data Based on Curvelet Transform (CT) and Minimum Spanning Tree (MST) Estimations of wind direction and CO 2 emissions from power plants with DaQi-1 Satellite Adaptive Modeling for Air Quality: A Continual Learning Framework for PM 2.5 Estimation in Vietnam LKA-GFNet: Language Knowledge-Augmented Graph Fusion for Tri-Source Heterogeneous Remote Sensing Data Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1