M3ICNet: A cross-modal resolution preserving building damage detection method with optical and SAR remote sensing imagery and two heterogeneous image disaster datasets
Haiming Zhang , Guorui Ma , Di Wang , Yongxian Zhang
{"title":"M3ICNet: A cross-modal resolution preserving building damage detection method with optical and SAR remote sensing imagery and two heterogeneous image disaster datasets","authors":"Haiming Zhang , Guorui Ma , Di Wang , Yongxian Zhang","doi":"10.1016/j.isprsjprs.2025.02.004","DOIUrl":null,"url":null,"abstract":"<div><div>Building damage detection based on optical and SAR remote sensing imagery can mitigate the adverse effects of weather, climate, and nighttime imaging. However, under emergency conditions, inherent limitations such as satellite availability, sensor swath width, and data sensitivity make it challenging to unify the resolution of optical and SAR imagery covering the same area. Additionally, optical imagery with varying resolutions is generally more abundant than SAR imagery. Most existing research employs resampling to resize bi-temporal images before subsequent analysis. However, this practice often disrupts the original data structure and can distort the spectral reflectance characteristics or scattering intensity of damaged building targets in the images. Furthermore, the one-to-one use of optical-SAR imagery fails to leverage the richness of optical imagery resources for detection tasks. Currently, there is a scarcity of optical-SAR image datasets specifically tailored for building damage detection purposes. To capitalize on the quantitative and resolution advantages of optical images and effectively extract SAR image features while preserving the original data structure, we engineered M3ICNet—a multimodal, multiresolution, multilevel information interaction and convergence network. M3ICNet accepts inputs in cross-modal and cross-resolution formats, accommodating three types of optical-SAR-optical images with resolutions doubling incrementally. This design effectively incorporates optical imagery at two scales while maintaining the structural integrity of SAR imagery. The network operates horizontally and vertically, achieving multiscale resolution preservation and feature fusion alongside deep feature mining. Its parallelized feature interaction module refines the coherent representation of optical and SAR data features comprehensively. It accomplishes this by learning the dependencies across different scales through feature contraction and diffusion. Relying on the network’s innovative structure and core components, M3ICNet extracts consistent damage information between optical-SAR heterogeneous imagery and detects damaged buildings effectively. We gathered optical-SAR-optical remote sensing imagery from natural disasters (such as the Turkey earthquake) and man-made disasters (such as the Russian-Ukrainian conflict) to create two multimodal building damage detection datasets (WBD and EBD). Extensive comparative experiments were conducted using these two datasets, along with six publicly available optical-SAR datasets, employing ten supervised and unsupervised methods. The results indicate that M3ICNet achieves the highest average detection accuracy (F1-score) of nearly 80% on the damaged building dataset, outperforming other comparative methods on public datasets. Furthermore, it strikes a balance between accuracy and efficiency.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 224-250"},"PeriodicalIF":10.6000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625000553","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Building damage detection based on optical and SAR remote sensing imagery can mitigate the adverse effects of weather, climate, and nighttime imaging. However, under emergency conditions, inherent limitations such as satellite availability, sensor swath width, and data sensitivity make it challenging to unify the resolution of optical and SAR imagery covering the same area. Additionally, optical imagery with varying resolutions is generally more abundant than SAR imagery. Most existing research employs resampling to resize bi-temporal images before subsequent analysis. However, this practice often disrupts the original data structure and can distort the spectral reflectance characteristics or scattering intensity of damaged building targets in the images. Furthermore, the one-to-one use of optical-SAR imagery fails to leverage the richness of optical imagery resources for detection tasks. Currently, there is a scarcity of optical-SAR image datasets specifically tailored for building damage detection purposes. To capitalize on the quantitative and resolution advantages of optical images and effectively extract SAR image features while preserving the original data structure, we engineered M3ICNet—a multimodal, multiresolution, multilevel information interaction and convergence network. M3ICNet accepts inputs in cross-modal and cross-resolution formats, accommodating three types of optical-SAR-optical images with resolutions doubling incrementally. This design effectively incorporates optical imagery at two scales while maintaining the structural integrity of SAR imagery. The network operates horizontally and vertically, achieving multiscale resolution preservation and feature fusion alongside deep feature mining. Its parallelized feature interaction module refines the coherent representation of optical and SAR data features comprehensively. It accomplishes this by learning the dependencies across different scales through feature contraction and diffusion. Relying on the network’s innovative structure and core components, M3ICNet extracts consistent damage information between optical-SAR heterogeneous imagery and detects damaged buildings effectively. We gathered optical-SAR-optical remote sensing imagery from natural disasters (such as the Turkey earthquake) and man-made disasters (such as the Russian-Ukrainian conflict) to create two multimodal building damage detection datasets (WBD and EBD). Extensive comparative experiments were conducted using these two datasets, along with six publicly available optical-SAR datasets, employing ten supervised and unsupervised methods. The results indicate that M3ICNet achieves the highest average detection accuracy (F1-score) of nearly 80% on the damaged building dataset, outperforming other comparative methods on public datasets. Furthermore, it strikes a balance between accuracy and efficiency.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.