Yuanling Li, Shengyuan Zou, Tianzhong Zhao, Xiaohui Su
{"title":"MDFA-Net: Multi-Scale Differential Feature Self-Attention Network for Building Change Detection in Remote Sensing Images","authors":"Yuanling Li, Shengyuan Zou, Tianzhong Zhao, Xiaohui Su","doi":"10.3390/rs16183466","DOIUrl":null,"url":null,"abstract":"Building change detection (BCD) from remote sensing images is an essential field for urban studies. In this well-developed field, Convolutional Neural Networks (CNNs) and Transformer have been leveraged to empower BCD models in handling multi-scale information. However, it is still challenging to accurately detect subtle changes using current models, which has been the main bottleneck to improving detection accuracy. In this paper, a multi-scale differential feature self-attention network (MDFA-Net) is proposed to effectively integrate CNN and Transformer by balancing the global receptive field from the self-attention mechanism and the local receptive field from convolutions. In MDFA-Net, two innovative modules were designed. Particularly, a hierarchical multi-scale dilated convolution (HMDConv) module was proposed to extract local features with hybrid dilation convolutions, which can ameliorate the effect of CNN’s local bias. In addition, a differential feature self-attention (DFA) module was developed to implement the self-attention mechanism at multi-scale difference feature maps to overcome the problem that local details may be lost in the global receptive field in Transformer. The proposed MDFA-Net achieves state-of-the-art accuracy performance in comparison with related works, e.g., USSFC-Net, in three open datasets: WHU-CD, CDD-CD, and LEVIR-CD. Based on the experimental results, MDFA-Net significantly exceeds other models in F1 score, IoU, and overall accuracy; the F1 score is 93.81%, 95.52%, and 91.21% in WHU-CD, CDD-CD, and LEVIR-CD datasets, respectively. Furthermore, MDFA-Net achieved first or second place in precision and recall in the test in all three datasets, which indicates its better balance in precision and recall than other models. We also found that subtle changes, i.e., small-sized building changes and irregular boundary changes, are better detected thanks to the introduction of HMDConv and DFA. To this end, with its better ability to leverage multi-scale differential information than traditional methods, MDFA-Net provides a novel and effective avenue to integrate CNN and Transformer in BCD. Further studies could focus on improving the model’s insensitivity to hyper-parameters and the model’s generalizability in practical applications.","PeriodicalId":48993,"journal":{"name":"Remote Sensing","volume":"40 1","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/rs16183466","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Building change detection (BCD) from remote sensing images is an essential field for urban studies. In this well-developed field, Convolutional Neural Networks (CNNs) and Transformer have been leveraged to empower BCD models in handling multi-scale information. However, it is still challenging to accurately detect subtle changes using current models, which has been the main bottleneck to improving detection accuracy. In this paper, a multi-scale differential feature self-attention network (MDFA-Net) is proposed to effectively integrate CNN and Transformer by balancing the global receptive field from the self-attention mechanism and the local receptive field from convolutions. In MDFA-Net, two innovative modules were designed. Particularly, a hierarchical multi-scale dilated convolution (HMDConv) module was proposed to extract local features with hybrid dilation convolutions, which can ameliorate the effect of CNN’s local bias. In addition, a differential feature self-attention (DFA) module was developed to implement the self-attention mechanism at multi-scale difference feature maps to overcome the problem that local details may be lost in the global receptive field in Transformer. The proposed MDFA-Net achieves state-of-the-art accuracy performance in comparison with related works, e.g., USSFC-Net, in three open datasets: WHU-CD, CDD-CD, and LEVIR-CD. Based on the experimental results, MDFA-Net significantly exceeds other models in F1 score, IoU, and overall accuracy; the F1 score is 93.81%, 95.52%, and 91.21% in WHU-CD, CDD-CD, and LEVIR-CD datasets, respectively. Furthermore, MDFA-Net achieved first or second place in precision and recall in the test in all three datasets, which indicates its better balance in precision and recall than other models. We also found that subtle changes, i.e., small-sized building changes and irregular boundary changes, are better detected thanks to the introduction of HMDConv and DFA. To this end, with its better ability to leverage multi-scale differential information than traditional methods, MDFA-Net provides a novel and effective avenue to integrate CNN and Transformer in BCD. Further studies could focus on improving the model’s insensitivity to hyper-parameters and the model’s generalizability in practical applications.
期刊介绍:
Remote Sensing (ISSN 2072-4292) publishes regular research papers, reviews, letters and communications covering all aspects of the remote sensing process, from instrument design and signal processing to the retrieval of geophysical parameters and their application in geosciences. Our aim is to encourage scientists to publish experimental, theoretical and computational results in as much detail as possible so that results can be easily reproduced. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.