MaCon: A Generic Self-Supervised Framework for Unsupervised Multimodal Change Detection

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-02-24 DOI:10.1109/TIP.2025.3542276

Jian Wang;Li Yan;Jianbing Yang;Hong Xie;Qiangqiang Yuan;Pengcheng Wei;Zhao Gao;Ce Zhang;Peter M. Atkinson

{"title":"MaCon: A Generic Self-Supervised Framework for Unsupervised Multimodal Change Detection","authors":"Jian Wang;Li Yan;Jianbing Yang;Hong Xie;Qiangqiang Yuan;Pengcheng Wei;Zhao Gao;Ce Zhang;Peter M. Atkinson","doi":"10.1109/TIP.2025.3542276","DOIUrl":null,"url":null,"abstract":"Change detection(CD) is important for Earth observation, emergency response and time-series understanding. Recently, data availability in various modalities has increased rapidly, and multimodal change detection (MCD) is gaining prominence. Given the scarcity of datasets and labels for MCD, unsupervised approaches are more practical for MCD. However, previous methods typically either merely reduce the gap between multimodal data through transformation or feed the original multimodal data directly into the discriminant network for difference extraction. The former faces challenges in extracting precise difference features. The latter contains the pronounced intrinsic distinction between the original multimodal data; direct extraction and comparison of features usually introduce significant noise, thereby compromising the quality of the resultant difference image. In this article, we proposed the MaCon framework to synergistically distill the common and discrepancy representations. The MaCon framework unifies mask reconstruction (MR) and contrastive learning (CL) self-supervised paradigms, where the MR serves the purpose of transformation while CL focuses on discrimination. Moreover, we presented an optimal sampling strategy in the CL architecture, enabling the CL subnetwork to extract more distinguishable discrepancy representations. Furthermore, we developed an effective silent attention mechanism that not only enhances contrast in output representations but stabilizes the training. Experimental results on both multimodal and monomodal datasets demonstrate that the MaCon framework effectively distills the intrinsic common representations between varied modalities and manifests state-of-the-art performance across both multimodal and monomodal CD. Such findings imply that the MaCon possesses the potential to serve as a unified framework in the CD and relevant fields. Source code will be publicly available once the article is accepted.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1485-1500"},"PeriodicalIF":13.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10899764/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Change detection(CD) is important for Earth observation, emergency response and time-series understanding. Recently, data availability in various modalities has increased rapidly, and multimodal change detection (MCD) is gaining prominence. Given the scarcity of datasets and labels for MCD, unsupervised approaches are more practical for MCD. However, previous methods typically either merely reduce the gap between multimodal data through transformation or feed the original multimodal data directly into the discriminant network for difference extraction. The former faces challenges in extracting precise difference features. The latter contains the pronounced intrinsic distinction between the original multimodal data; direct extraction and comparison of features usually introduce significant noise, thereby compromising the quality of the resultant difference image. In this article, we proposed the MaCon framework to synergistically distill the common and discrepancy representations. The MaCon framework unifies mask reconstruction (MR) and contrastive learning (CL) self-supervised paradigms, where the MR serves the purpose of transformation while CL focuses on discrimination. Moreover, we presented an optimal sampling strategy in the CL architecture, enabling the CL subnetwork to extract more distinguishable discrepancy representations. Furthermore, we developed an effective silent attention mechanism that not only enhances contrast in output representations but stabilizes the training. Experimental results on both multimodal and monomodal datasets demonstrate that the MaCon framework effectively distills the intrinsic common representations between varied modalities and manifests state-of-the-art performance across both multimodal and monomodal CD. Such findings imply that the MaCon possesses the potential to serve as a unified framework in the CD and relevant fields. Source code will be publicly available once the article is accepted.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MaCon：用于无监督多模态变化检测的通用自监督框架

变化探测（CD）对于地球观测、应急响应和时间序列理解非常重要。近年来，各种模式的数据可用性迅速增加，多模式变化检测（MCD）日益受到重视。考虑到MCD的数据集和标签的稀缺性，无监督方法对于MCD来说更实用。然而，以往的方法通常只是通过变换来缩小多模态数据之间的差距，或者直接将原始多模态数据输入判别网络进行差异提取。前者在精确提取差异特征方面面临挑战。后者包含了原始多模态数据之间明显的内在区别；直接提取和比较特征通常会引入明显的噪声，从而影响所得差分图像的质量。在本文中，我们提出了MaCon框架来协同提取共同和差异表示。MaCon框架统一了掩模重建（MR）和对比学习（CL）自监督范式，其中MR的目的是转换，而CL的目的是辨别。此外，我们提出了CL体系结构中的最优采样策略，使CL子网能够提取更多可区分的差异表示。此外，我们开发了一种有效的沉默注意机制，不仅增强了输出表示的对比度，而且稳定了训练。在多模态和单模态数据集上的实验结果表明，MaCon框架有效地提取了不同模态之间的内在共同表征，并在多模态和单模态CD中表现出最先进的性能。这些发现意味着MaCon具有作为CD和相关领域的统一框架的潜力。一旦文章被接受，源代码将公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量