{"title":"A Novel Remote Sensing Image Change Detection Approach Based on Multilevel State Space Model","authors":"Zhongyu Zhang;Xuanmei Fan;Xin Wang;Yingxiang Qin;Junshi Xia","doi":"10.1109/TGRS.2024.3501303","DOIUrl":null,"url":null,"abstract":"Remote sensing image change detection (CD) is crucial for disaster assessment, land use change, and urban management. Most CD methods are realized by CNN and Transformer. However, these methods are not satisfied with modeling global dependencies while keeping a low computational complexity. Recently, the emergence of Mamba architectures based on state space models (SSMs) can remedy the above problems. In this article, we propose a visual Mamba-based multiscale feature extraction network to efficiently interactively fuse global and local information, which is named as MF-VMamba (MF: multiscale feature). First, a VMamba-based encoder is used to extract multiscale semantic features from bitemporal images. Then, a feature enhancement module (FEM) is proposed to capture the difference information between images. In addition, we employ a multilevel attention decoder (MAD) based on large kernel convolution (LKC) to obtain the information in spatial and spectral dimensions to realize the information interaction between global and local features. After the sequential processing of these three modules, the discriminative ability of changing objects is significantly improved. Notably, the computational complexity of our VMamba-based model grows linearly, which can significantly reduce the computational cost. In the experiments, our method performs well on CDD, DSIFN-CD, LEVIR-CD, and SYSU-CD datasets, with \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n scores and OA reaching \n<inline-formula> <tex-math>$95.69\\%/88.05\\%/90.64\\%{/86.95\\%}$ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$98.97\\%/96.01\\%/99.07\\%{/90.75\\%}$ </tex-math></inline-formula>\n, respectively. The code can be accessed at \n<uri>https://github.com/121zzy/MF-Mamba.git</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10756674/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Remote sensing image change detection (CD) is crucial for disaster assessment, land use change, and urban management. Most CD methods are realized by CNN and Transformer. However, these methods are not satisfied with modeling global dependencies while keeping a low computational complexity. Recently, the emergence of Mamba architectures based on state space models (SSMs) can remedy the above problems. In this article, we propose a visual Mamba-based multiscale feature extraction network to efficiently interactively fuse global and local information, which is named as MF-VMamba (MF: multiscale feature). First, a VMamba-based encoder is used to extract multiscale semantic features from bitemporal images. Then, a feature enhancement module (FEM) is proposed to capture the difference information between images. In addition, we employ a multilevel attention decoder (MAD) based on large kernel convolution (LKC) to obtain the information in spatial and spectral dimensions to realize the information interaction between global and local features. After the sequential processing of these three modules, the discriminative ability of changing objects is significantly improved. Notably, the computational complexity of our VMamba-based model grows linearly, which can significantly reduce the computational cost. In the experiments, our method performs well on CDD, DSIFN-CD, LEVIR-CD, and SYSU-CD datasets, with
$F1$
scores and OA reaching
$95.69\%/88.05\%/90.64\%{/86.95\%}$
and
$98.97\%/96.01\%/99.07\%{/90.75\%}$
, respectively. The code can be accessed at
https://github.com/121zzy/MF-Mamba.git
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.