MaCon: A Generic Self-Supervised Framework for Unsupervised Multimodal Change Detection

Jian Wang;Li Yan;Jianbing Yang;Hong Xie;Qiangqiang Yuan;Pengcheng Wei;Zhao Gao;Ce Zhang;Peter M. Atkinson
{"title":"MaCon: A Generic Self-Supervised Framework for Unsupervised Multimodal Change Detection","authors":"Jian Wang;Li Yan;Jianbing Yang;Hong Xie;Qiangqiang Yuan;Pengcheng Wei;Zhao Gao;Ce Zhang;Peter M. Atkinson","doi":"10.1109/TIP.2025.3542276","DOIUrl":null,"url":null,"abstract":"Change detection(CD) is important for Earth observation, emergency response and time-series understanding. Recently, data availability in various modalities has increased rapidly, and multimodal change detection (MCD) is gaining prominence. Given the scarcity of datasets and labels for MCD, unsupervised approaches are more practical for MCD. However, previous methods typically either merely reduce the gap between multimodal data through transformation or feed the original multimodal data directly into the discriminant network for difference extraction. The former faces challenges in extracting precise difference features. The latter contains the pronounced intrinsic distinction between the original multimodal data; direct extraction and comparison of features usually introduce significant noise, thereby compromising the quality of the resultant difference image. In this article, we proposed the MaCon framework to synergistically distill the common and discrepancy representations. The MaCon framework unifies mask reconstruction (MR) and contrastive learning (CL) self-supervised paradigms, where the MR serves the purpose of transformation while CL focuses on discrimination. Moreover, we presented an optimal sampling strategy in the CL architecture, enabling the CL subnetwork to extract more distinguishable discrepancy representations. Furthermore, we developed an effective silent attention mechanism that not only enhances contrast in output representations but stabilizes the training. Experimental results on both multimodal and monomodal datasets demonstrate that the MaCon framework effectively distills the intrinsic common representations between varied modalities and manifests state-of-the-art performance across both multimodal and monomodal CD. Such findings imply that the MaCon possesses the potential to serve as a unified framework in the CD and relevant fields. Source code will be publicly available once the article is accepted.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1485-1500"},"PeriodicalIF":13.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10899764/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Change detection(CD) is important for Earth observation, emergency response and time-series understanding. Recently, data availability in various modalities has increased rapidly, and multimodal change detection (MCD) is gaining prominence. Given the scarcity of datasets and labels for MCD, unsupervised approaches are more practical for MCD. However, previous methods typically either merely reduce the gap between multimodal data through transformation or feed the original multimodal data directly into the discriminant network for difference extraction. The former faces challenges in extracting precise difference features. The latter contains the pronounced intrinsic distinction between the original multimodal data; direct extraction and comparison of features usually introduce significant noise, thereby compromising the quality of the resultant difference image. In this article, we proposed the MaCon framework to synergistically distill the common and discrepancy representations. The MaCon framework unifies mask reconstruction (MR) and contrastive learning (CL) self-supervised paradigms, where the MR serves the purpose of transformation while CL focuses on discrimination. Moreover, we presented an optimal sampling strategy in the CL architecture, enabling the CL subnetwork to extract more distinguishable discrepancy representations. Furthermore, we developed an effective silent attention mechanism that not only enhances contrast in output representations but stabilizes the training. Experimental results on both multimodal and monomodal datasets demonstrate that the MaCon framework effectively distills the intrinsic common representations between varied modalities and manifests state-of-the-art performance across both multimodal and monomodal CD. Such findings imply that the MaCon possesses the potential to serve as a unified framework in the CD and relevant fields. Source code will be publicly available once the article is accepted.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MaCon:用于无监督多模态变化检测的通用自监督框架
变化探测(CD)对于地球观测、应急响应和时间序列理解非常重要。近年来,各种模式的数据可用性迅速增加,多模式变化检测(MCD)日益受到重视。考虑到MCD的数据集和标签的稀缺性,无监督方法对于MCD来说更实用。然而,以往的方法通常只是通过变换来缩小多模态数据之间的差距,或者直接将原始多模态数据输入判别网络进行差异提取。前者在精确提取差异特征方面面临挑战。后者包含了原始多模态数据之间明显的内在区别;直接提取和比较特征通常会引入明显的噪声,从而影响所得差分图像的质量。在本文中,我们提出了MaCon框架来协同提取共同和差异表示。MaCon框架统一了掩模重建(MR)和对比学习(CL)自监督范式,其中MR的目的是转换,而CL的目的是辨别。此外,我们提出了CL体系结构中的最优采样策略,使CL子网能够提取更多可区分的差异表示。此外,我们开发了一种有效的沉默注意机制,不仅增强了输出表示的对比度,而且稳定了训练。在多模态和单模态数据集上的实验结果表明,MaCon框架有效地提取了不同模态之间的内在共同表征,并在多模态和单模态CD中表现出最先进的性能。这些发现意味着MaCon具有作为CD和相关领域的统一框架的潜力。一旦文章被接受,源代码将公开提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement Long-Tailed and Inter-Class Homogeneity Matters in Multi-Class Weakly Supervised Tissue Segmentation of Histopathology Images DiffLLFace: Learning Alternate Illumination-Diffusion Adaptation for Low-Light Face Super-Resolution and Beyond Nonlinear Transformed Low-Rank Quaternion Tensor Total Variation for Multidimensional Color Image Completion Collaborated With Hallucination: Enhancing Egocentric Grounded Question Answering via Error Demonstrations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1