Hongjin Ren;Min Xia;Liguo Weng;Haifeng Lin;Junqing Huang;Kai Hu
{"title":"Interactive and Supervised Dual-Mode Attention Network for Remote Sensing Image Change Detection","authors":"Hongjin Ren;Min Xia;Liguo Weng;Haifeng Lin;Junqing Huang;Kai Hu","doi":"10.1109/TGRS.2025.3540864","DOIUrl":null,"url":null,"abstract":"With the rapid advancement of remote sensing technology, change detection using bitemporal remote sensing images has significant applications in land use planning and environmental monitoring. The emergence of convolutional neural networks (CNNs) has accelerated the development of deep learning-based change detection. However, existing deep learning algorithms exhibit limitations in understanding bitemporal feature relationships and accurately identifying change region boundaries. Moreover, they inadequately explore feature interactions between bitemporal images before extracting differential features. To address these issues, this article proposes a novel interactive and supervised dual-mode attention network (ISDANet). In the feature encoding stage, we employ the lightweight MobileNetV2 as the backbone to extract bitemporal features. Additionally, we design the neighbor feature aggregation module (NFAM) to aggregate semantic features from adjacent scales within the dual-branch backbone, enhancing the representation of temporal features. We further introduce the interactive attention enhancement module (IAEM), which effectively integrates self-attention and cross-attention mechanisms. This establishes deep interactions between bitemporal features, suppresses irrelevant noise, and ensures precise focus on true change regions. In the feature decoding stage, the supervised attention module (SAM) reweights differential features and leverages supervisory signals to guide the learning of attention mechanisms, significantly improving boundary detection accuracy. SAM dynamically aggregates multilevel features, balancing high-level semantics and low-level details to capture subtle changes in complex scenes. The proposed model achieves F1 scores that are 0.28%, 1.6%, and 0.76% higher than the best comparative method, spatiotemporal enhancement and interlevel fusion network (SEIFNet), on three CD datasets [LEVIR-CD, Guangzhou dataset (GZ-CD), and Sun Yat-sen University dataset (SYSU-CD)], respectively, while maintaining a lightweight design with only 6.93 M parameters and 3.46G floating-point operations (FLOPs). The code is available at <uri>https://github.com/RenHongjin6/ISDANet</uri>.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-18"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10879780/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid advancement of remote sensing technology, change detection using bitemporal remote sensing images has significant applications in land use planning and environmental monitoring. The emergence of convolutional neural networks (CNNs) has accelerated the development of deep learning-based change detection. However, existing deep learning algorithms exhibit limitations in understanding bitemporal feature relationships and accurately identifying change region boundaries. Moreover, they inadequately explore feature interactions between bitemporal images before extracting differential features. To address these issues, this article proposes a novel interactive and supervised dual-mode attention network (ISDANet). In the feature encoding stage, we employ the lightweight MobileNetV2 as the backbone to extract bitemporal features. Additionally, we design the neighbor feature aggregation module (NFAM) to aggregate semantic features from adjacent scales within the dual-branch backbone, enhancing the representation of temporal features. We further introduce the interactive attention enhancement module (IAEM), which effectively integrates self-attention and cross-attention mechanisms. This establishes deep interactions between bitemporal features, suppresses irrelevant noise, and ensures precise focus on true change regions. In the feature decoding stage, the supervised attention module (SAM) reweights differential features and leverages supervisory signals to guide the learning of attention mechanisms, significantly improving boundary detection accuracy. SAM dynamically aggregates multilevel features, balancing high-level semantics and low-level details to capture subtle changes in complex scenes. The proposed model achieves F1 scores that are 0.28%, 1.6%, and 0.76% higher than the best comparative method, spatiotemporal enhancement and interlevel fusion network (SEIFNet), on three CD datasets [LEVIR-CD, Guangzhou dataset (GZ-CD), and Sun Yat-sen University dataset (SYSU-CD)], respectively, while maintaining a lightweight design with only 6.93 M parameters and 3.46G floating-point operations (FLOPs). The code is available at https://github.com/RenHongjin6/ISDANet.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.