Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model.

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-10 DOI:10.1109/TPAMI.2024.3475824

Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong

{"title":"Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model.","authors":"Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong","doi":"10.1109/TPAMI.2024.3475824","DOIUrl":null,"url":null,"abstract":"Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a \"generative change foundation model\" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing \"foundation models\", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 2 pixel single-temporal images can yield time series of any length and resolutions of 1,024 2 pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model and datasets are available at https://github.com/Z-Zheng/pytorch-change-models.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3475824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a "generative change foundation model" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing "foundation models", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 ² pixel single-temporal images can yield time series of any length and resolutions of 1,024 ² pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment. The model and datasets are available at https://github.com/Z-Zheng/pytorch-change-models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Changen2：多时遥感生成变化基础模型

深度视觉模型极大地促进了我们对地球表面时间动态的理解，而深度视觉模型通常需要大量标注的多时态图像进行训练。然而，大规模收集、预处理和标注多时相遥感图像并非易事，因为它既昂贵又需要大量知识。在本文中，我们提出了基于生成模型的可扩展多时变化数据生成器，这种生成器既便宜又自动，从而缓解了这些数据问题。我们的主要想法是模拟随时间变化的随机变化过程。我们将随机变化过程描述为一个概率图形模型，即生成概率变化模型（GPCM），它将复杂的模拟问题分解为两个更容易解决的子问题，即条件级变化事件模拟和图像级语义变化合成。为了解决这两个问题，我们提出了 Changen2，这是一种利用分辨率可扩展的扩散变换器实现的 GPCM，可以从已标记甚至未标记的单时相图像中生成遥感图像的时间序列以及相应的语义和变化标签。Changen2 是一种 "生成式变化基础模型"，可通过自我监督进行大规模训练，并能从未标明的单时相图像中生成变化监督信号。与现有的 "基础模型 "不同，我们的生成式变化基础模型综合了变化数据，以训练用于变化检测的特定任务基础模型。由此产生的模型具有固有的零镜头变化检测能力和出色的可移植性。综合实验表明，Changen2 在数据生成方面具有卓越的时空可扩展性，例如，在 256 2 像素单时相图像上训练的 Changen2 模型可生成任意长度和分辨率为 1,024 2 像素的时间序列。Changen2 预先训练的模型表现出卓越的零拍摄性能（与完全监督的模型相比，在 LEVIR-CD 上的性能差距缩小到 3%，在 S2Looking 和 SECOND 上的性能差距缩小到约 10%），并可用于多种类型的变化任务，包括普通和非天顶建筑变化、土地利用/土地覆被变化和灾害评估。该模型和数据集可在 https://github.com/Z-Zheng/pytorch-change-models 上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量