首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
3D building reconstruction from monocular remote sensing imagery via diffusion models and geometric priors 基于扩散模型和几何先验的单眼遥感影像三维建筑重建
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-05 DOI: 10.1016/j.isprsjprs.2025.11.018
Zhenghao Hu, Weijia Li, Jinhua Yu, Minfa Liu, Junyan Ye, Peimin Chen, Huabing Huang
3D building reconstruction from monocular remote sensing imagery has emerged as a critical research topic due to its cost-effective data acquisition and scalability for large-area applications. However, the reconstruction accuracy of existing methods remains limited due to suboptimal performance in both contour extraction and height estimation. The limited feature extraction capabilities of the models and the difficulty in differentiating building roofs from facades collectively restrict the accuracy of reconstructed contours in existing methods. Meanwhile, building height estimation remains particularly challenging in monocular images due to the limited information from a single perspective. To address these challenges, we propose DG-BRF, a Diffusion-Geometric based Building Reconstruction Framework that integrates a diffusion-model-based roof and facade segmentation network with a geometric prior-driven offset calculation method for precise 3D reconstruction from monocular remote sensing images. To improve the accuracy of building contour extraction, we design an effective diffusion-model-based roof and facade segmentation network and improve roof-facade differentiation by novelly incorporating a depth-aware encoder. Moreover, unlike conventional methods that rely on challenging height regression, we propose a geometric prior-driven offset calculation method, strategically converting the challenging height regression problem into a simple roof-facade matching task. Experimental results on two newly proposed 3D reconstruction datasets demonstrate the effectiveness of our framework. DG-BRF achieves superior performance, outperforming the current state-of-the-art by 3% and 13% in height estimation accuracy and 3% and 6% in footprint segmentation F1-score, demonstrating its capability to overcome the limitations of existing methods and offer a novel solution for 3D building reconstruction from monocular remote sensing images. The dataset and source code of this work will be available at https://github.com/zhenghaohu/DG-BRF.
基于单目遥感图像的三维建筑重建由于其具有成本效益和大面积应用的可扩展性而成为一个重要的研究课题。然而,现有方法在轮廓提取和高度估计方面的性能不理想,使得重建精度受到限制。模型的有限特征提取能力以及区分建筑物屋顶和立面的困难共同限制了现有方法中重建轮廓的准确性。同时,由于单一视角的信息有限,建筑物高度估计在单眼图像中仍然具有特别的挑战性。为了解决这些挑战,我们提出了基于扩散几何的建筑重建框架DG-BRF,该框架将基于扩散模型的屋顶和立面分割网络与几何先验驱动的偏移计算方法集成在一起,用于从单目遥感图像进行精确的3D重建。为了提高建筑轮廓提取的准确性,我们设计了一个有效的基于扩散模型的屋顶和立面分割网络,并通过新颖地结合深度感知编码器来改善屋顶和立面的区分。此外,与依赖挑战性高度回归的传统方法不同,我们提出了一种几何先验驱动的偏移计算方法,策略性地将挑战性高度回归问题转化为简单的屋顶-立面匹配任务。在两个新提出的三维重建数据集上的实验结果证明了该框架的有效性。DG-BRF实现了卓越的性能,在高度估计精度上比目前最先进的技术高出3%和13%,在足迹分割f1得分上比目前的技术高出3%和6%,证明了其克服现有方法局限性的能力,并为单目遥感图像的3D建筑重建提供了一种新的解决方案。这项工作的数据集和源代码可在https://github.com/zhenghaohu/DG-BRF上获得。
{"title":"3D building reconstruction from monocular remote sensing imagery via diffusion models and geometric priors","authors":"Zhenghao Hu,&nbsp;Weijia Li,&nbsp;Jinhua Yu,&nbsp;Minfa Liu,&nbsp;Junyan Ye,&nbsp;Peimin Chen,&nbsp;Huabing Huang","doi":"10.1016/j.isprsjprs.2025.11.018","DOIUrl":"10.1016/j.isprsjprs.2025.11.018","url":null,"abstract":"<div><div>3D building reconstruction from monocular remote sensing imagery has emerged as a critical research topic due to its cost-effective data acquisition and scalability for large-area applications. However, the reconstruction accuracy of existing methods remains limited due to suboptimal performance in both contour extraction and height estimation. The limited feature extraction capabilities of the models and the difficulty in differentiating building roofs from facades collectively restrict the accuracy of reconstructed contours in existing methods. Meanwhile, building height estimation remains particularly challenging in monocular images due to the limited information from a single perspective. To address these challenges, we propose DG-BRF, a Diffusion-Geometric based Building Reconstruction Framework that integrates a diffusion-model-based roof and facade segmentation network with a geometric prior-driven offset calculation method for precise 3D reconstruction from monocular remote sensing images. To improve the accuracy of building contour extraction, we design an effective diffusion-model-based roof and facade segmentation network and improve roof-facade differentiation by novelly incorporating a depth-aware encoder. Moreover, unlike conventional methods that rely on challenging height regression, we propose a geometric prior-driven offset calculation method, strategically converting the challenging height regression problem into a simple roof-facade matching task. Experimental results on two newly proposed 3D reconstruction datasets demonstrate the effectiveness of our framework. DG-BRF achieves superior performance, outperforming the current state-of-the-art by 3% and 13% in height estimation accuracy and 3% and 6% in footprint segmentation F1-score, demonstrating its capability to overcome the limitations of existing methods and offer a novel solution for 3D building reconstruction from monocular remote sensing images. The dataset and source code of this work will be available at <span><span>https://github.com/zhenghaohu/DG-BRF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 124-137"},"PeriodicalIF":12.2,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-satellite hierarchical multimodal denoising for hyperspectral imagery 高光谱图像的跨卫星分层多模态去噪
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-04 DOI: 10.1016/j.isprsjprs.2025.11.028
Minghua Wang , Bo Shang , Yilun Li , Xin Zhao , Lianru Gao , Xu Sun , Longfei Ren
The rapid proliferation of remote sensing satellite constellations has ushered in an era of unprecedented access to hyperspectral (HS) and multispectral (MS) Earth observation data. HS denoising remains a critical research focus for enhancing the interpretation and application of satellite data. However, on the one hand, the available characteristics of noisy HS satellite data are often limited under real-world and unknown conditions. On the other hand, existing works ignore the complementary advantages offered by MS satellite imagery that can be employed to enhance HS denoising performance. Leveraging the synergistic potential of HS and MS data, this study pioneers a novel paradigm for the joint exploitation and optimization of multi-source remote sensing satellites. A novel Hierarchical Dual Tucker Decomposition (DTucker) framework is proposed to capitalize on the low-rank (LR) tensor property of HS and MS cross-satellite data. The inheritance of properties from the original tensor to the core tensor is explored through a manually designed model-driven constraint or a data-driven multilayer perceptron (MLP) framework. This enables robust integration of MS-derived spatial richness with HS details and significantly enhances the denoising capability. We construct HS–MS data pairs from real satellite observations, including Earth Observing-1, Sentinel-2, Gaofen-1, Gaofen-5, and Gaofen-6. Four datasets span diverse scenes such as runways, urban areas, rivers, and farmlands. The proposed method demonstrates strong generalization across various satellite combinations and application scenarios. Notably, the incorporation of MS data markedly enhances both class discrimination and structural detail in the denoised HS outputs, promoting the performance of subsequent classification and analysis tasks. The datasets and codes implemented in MATLAB and PYTHON will be available on the website: https://github.com/MinghuaWang123/DTucker, contributing to the remote sensing community.
遥感卫星星座的快速增长,开启了一个前所未有的获取高光谱(HS)和多光谱(MS)地球观测数据的时代。高噪降噪是提高卫星数据解译和应用的一个重要研究热点。然而,一方面,在现实世界和未知条件下,高噪声卫星数据的可用特性往往受到限制。另一方面,现有的工作忽略了MS卫星图像可以用来增强HS去噪性能的互补优势。利用HS和MS数据的协同潜力,本研究开创了多源遥感卫星联合开发和优化的新范式。提出了一种新的分层对偶塔克分解(DTucker)框架,利用HS和MS交叉卫星数据的低秩张量特性。通过人工设计的模型驱动约束或数据驱动的多层感知器(MLP)框架来探索从原始张量到核心张量的属性继承。这使得ms衍生的空间丰富度与HS细节的鲁棒性集成,并显着增强了去噪能力。我们利用地球观测1号、哨兵2号、高分1号、高分5号和高分6号等实际卫星观测数据构建了HS-MS数据对。四个数据集涵盖了不同的场景,如跑道、城市地区、河流和农田。该方法具有较强的通用性,适用于各种卫星组合和应用场景。值得注意的是,MS数据的合并显著增强了去噪HS输出中的类别区分和结构细节,从而提高了后续分类和分析任务的性能。用MATLAB和PYTHON实现的数据集和代码将在网站上提供:https://github.com/MinghuaWang123/DTucker,为遥感社区做出贡献。
{"title":"Cross-satellite hierarchical multimodal denoising for hyperspectral imagery","authors":"Minghua Wang ,&nbsp;Bo Shang ,&nbsp;Yilun Li ,&nbsp;Xin Zhao ,&nbsp;Lianru Gao ,&nbsp;Xu Sun ,&nbsp;Longfei Ren","doi":"10.1016/j.isprsjprs.2025.11.028","DOIUrl":"10.1016/j.isprsjprs.2025.11.028","url":null,"abstract":"<div><div>The rapid proliferation of remote sensing satellite constellations has ushered in an era of unprecedented access to hyperspectral (HS) and multispectral (MS) Earth observation data. HS denoising remains a critical research focus for enhancing the interpretation and application of satellite data. However, on the one hand, the available characteristics of noisy HS satellite data are often limited under real-world and unknown conditions. On the other hand, existing works ignore the complementary advantages offered by MS satellite imagery that can be employed to enhance HS denoising performance. Leveraging the synergistic potential of HS and MS data, this study pioneers a novel paradigm for the joint exploitation and optimization of multi-source remote sensing satellites. A novel Hierarchical Dual Tucker Decomposition (DTucker) framework is proposed to capitalize on the low-rank (LR) tensor property of HS and MS cross-satellite data. The inheritance of properties from the original tensor to the core tensor is explored through a manually designed model-driven constraint or a data-driven multilayer perceptron (MLP) framework. This enables robust integration of MS-derived spatial richness with HS details and significantly enhances the denoising capability. We construct HS–MS data pairs from real satellite observations, including Earth Observing-1, Sentinel-2, Gaofen-1, Gaofen-5, and Gaofen-6. Four datasets span diverse scenes such as runways, urban areas, rivers, and farmlands. The proposed method demonstrates strong generalization across various satellite combinations and application scenarios. Notably, the incorporation of MS data markedly enhances both class discrimination and structural detail in the denoised HS outputs, promoting the performance of subsequent classification and analysis tasks. The datasets and codes implemented in MATLAB and PYTHON will be available on the website: <span><span>https://github.com/MinghuaWang123/DTucker</span><svg><path></path></svg></span>, contributing to the remote sensing community.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 94-108"},"PeriodicalIF":12.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extrapolate azimuth angles: Text and edge guided ISAR image generation based on foundation model 外推方位角:基于基础模型的文本和边缘引导ISAR图像生成
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-04 DOI: 10.1016/j.isprsjprs.2025.12.002
Jiawei Zhang, Xiaolin Zhou, Weidong Jiang, Xiaolong Su, Zhen Liu, Li Liu
Inverse Synthetic Aperture Radar (ISAR) has been widely applied in remote sensing and space target monitoring. Automatic Target Recognition (ATR) based on ISAR imagery plays a critical role in target interpretation and pose estimation. With the growing adoption of intelligent methods in the ATR domain, the quantity and quality of ISAR data have become decisive factors influencing algorithm performance. However, due to the complexity of ISAR imaging algorithms and the high cost of data acquisition, high-quality ISAR image datasets remain extremely scarce. As a result, learning the underlying characteristics of existing ISAR data to generate large-scale usable samples has become a pressing research focus. Although some preliminary studies have explored ISAR image data augmentation, most of them rely on image sequence interpolation or conditional generation, both of which exhibit critical limitations: the former requires densely sampled image sequences with small angular intervals, while the latter can only model the mapping between limited azimuth conditions and ISAR images. Neither approach is capable of generating images of new targets under unseen azimuth conditions, resulting in poor generalization and leaving substantial room for further exploration. To address these limitations, we formally define a novel research problem, termed ISAR azimuth angle extrapolation. This task fundamentally involves high-dimensional, structured, cross-view image synthesis, requiring the restoration of visual details while ensuring physical consistency and structural stability. To address this problem, we propose ISAR-ExtraNet, a foundation-model-based framework for ISAR azimuth angle extrapolation. ISAR-ExtraNet leverages the strong representation, modeling, and generalization capabilities of pretrained foundation models to generate ISAR images of new targets under novel azimuth conditions. Specifically, the model employs a two-stage coarse-to-fine fine-tuning strategy, incorporating optical image contours and scattering center distribution constraints to guide the generation process. This design enhances both semantic alignment and structural fidelity in the generated ISAR images. Comprehensive experiments demonstrate that ISAR-ExtraNet significantly outperforms baseline methods and fine-tuned foundation models, achieving 28.76 dB in PSNR and 0.80 in SSIM. We hope that the training paradigm introduced in ISAR-ExtraNet will inspire further exploration of the ISAR azimuth extrapolation problem and foster progress in this emerging research area.
逆合成孔径雷达(ISAR)在遥感和空间目标监测中得到了广泛的应用。基于ISAR图像的自动目标识别(ATR)在目标判读和姿态估计中起着至关重要的作用。随着智能方法在ATR领域的应用越来越广泛,ISAR数据的数量和质量已经成为影响算法性能的决定性因素。然而,由于ISAR成像算法的复杂性和数据采集的高成本,高质量的ISAR图像数据集仍然非常稀缺。因此,学习现有ISAR数据的潜在特征以生成大规模可用样本已成为一个紧迫的研究重点。虽然有一些初步的研究探索了ISAR图像数据的增强,但大多数研究都依赖于图像序列插值或条件生成,这两种方法都存在严重的局限性:前者需要小角度间隔的密集采样图像序列,而后者只能模拟有限方位角条件与ISAR图像之间的映射。这两种方法都不能在不可见的方位角条件下生成新目标的图像,导致泛化效果差,为进一步的探索留下了很大的空间。为了解决这些限制,我们正式定义了一个新的研究问题,称为ISAR方位角外推。该任务从根本上涉及高维、结构化、交叉视图图像合成,要求在保证物理一致性和结构稳定性的同时恢复视觉细节。为了解决这个问题,我们提出了ISAR- extranet,这是一个基于基础模型的ISAR方位角外推框架。ISAR- extranet利用预训练基础模型的强大表示、建模和泛化能力,在新的方位角条件下生成新目标的ISAR图像。具体而言,该模型采用两阶段粗到精的微调策略,结合光学图像轮廓和散射中心分布约束来指导生成过程。这种设计增强了生成的ISAR图像的语义一致性和结构保真度。综合实验表明,ISAR-ExtraNet显著优于基线方法和微调基础模型,PSNR达到28.76 dB, SSIM达到0.80。我们希望在ISAR- extranet中引入的训练范式将激发对ISAR方位外推问题的进一步探索,并促进这一新兴研究领域的进展。
{"title":"Extrapolate azimuth angles: Text and edge guided ISAR image generation based on foundation model","authors":"Jiawei Zhang,&nbsp;Xiaolin Zhou,&nbsp;Weidong Jiang,&nbsp;Xiaolong Su,&nbsp;Zhen Liu,&nbsp;Li Liu","doi":"10.1016/j.isprsjprs.2025.12.002","DOIUrl":"10.1016/j.isprsjprs.2025.12.002","url":null,"abstract":"<div><div>Inverse Synthetic Aperture Radar (ISAR) has been widely applied in remote sensing and space target monitoring. Automatic Target Recognition (ATR) based on ISAR imagery plays a critical role in target interpretation and pose estimation. With the growing adoption of intelligent methods in the ATR domain, the quantity and quality of ISAR data have become decisive factors influencing algorithm performance. However, due to the complexity of ISAR imaging algorithms and the high cost of data acquisition, high-quality ISAR image datasets remain extremely scarce. As a result, learning the underlying characteristics of existing ISAR data to generate large-scale usable samples has become a pressing research focus. Although some preliminary studies have explored ISAR image data augmentation, most of them rely on image sequence interpolation or conditional generation, both of which exhibit critical limitations: the former requires densely sampled image sequences with small angular intervals, while the latter can only model the mapping between limited azimuth conditions and ISAR images. Neither approach is capable of generating images of new targets under unseen azimuth conditions, resulting in poor generalization and leaving substantial room for further exploration. To address these limitations, we formally define a novel research problem, termed ISAR azimuth angle extrapolation. This task fundamentally involves high-dimensional, structured, cross-view image synthesis, requiring the restoration of visual details while ensuring physical consistency and structural stability. To address this problem, we propose ISAR-ExtraNet, a foundation-model-based framework for ISAR azimuth angle extrapolation. ISAR-ExtraNet leverages the strong representation, modeling, and generalization capabilities of pretrained foundation models to generate ISAR images of new targets under novel azimuth conditions. Specifically, the model employs a two-stage coarse-to-fine fine-tuning strategy, incorporating optical image contours and scattering center distribution constraints to guide the generation process. This design enhances both semantic alignment and structural fidelity in the generated ISAR images. Comprehensive experiments demonstrate that ISAR-ExtraNet significantly outperforms baseline methods and fine-tuned foundation models, achieving 28.76 dB in PSNR and 0.80 in SSIM. We hope that the training paradigm introduced in ISAR-ExtraNet will inspire further exploration of the ISAR azimuth extrapolation problem and foster progress in this emerging research area.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 109-123"},"PeriodicalIF":12.2,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CityVLM: Towards sustainable urban development via multi-view coordinated vision–language model CityVLM:通过多视角协调视觉语言模型实现城市可持续发展
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-03 DOI: 10.1016/j.isprsjprs.2025.11.030
Junjue Wang , Weihao Xuan , Heli Qi , Zihang Chen , Hongruixuan Chen , Zhuo Zheng , Junshi Xia , Yanfei Zhong , Naoto Yokoya
Vision–language models (VLMs) have shown remarkable promise in Earth Vision, particularly in providing human-interpretable analysis of remote sensing imagery. While existing VLMs excel at general visual perception tasks, they often fall short in addressing the complex needs of geoscience, which requires comprehensive urban analysis across geographical, social, and economic dimensions. To bridge this gap, we expand VLM capabilities to tackle sustainable urban development challenges by integrating two complementary sources: remote sensing (RS) and street-view (SV) imagery. Specifically, we first design a multi-view vision–language dataset (CitySet), comprising 20,589 RS images, 1.1 million SV images, and 0.8 million question–answer pairs. CitySet facilitates geospatial object reasoning, social object analysis, urban economic assessment, and sustainable development report generation. Besides, we develop CityVLM to integrate macro- and micro-level semantics using geospatial and temporal modeling, while its language modeling component generates detailed urban reports. We extensively benchmarked 10 advanced VLMs on our dataset, revealing that state-of-the-art models struggle with urban analysis tasks, primarily due to domain gaps and limited multi-view data alignment capabilities. By addressing these issues, CityVLM achieves superior performance consistently across all tasks and advances automated urban analysis through practical applications like heat island effect monitoring, offering valuable tools for city planners and policymakers in their sustainability efforts.
视觉语言模型(VLMs)在地球视觉中表现出了显著的前景,特别是在提供人类可解释的遥感图像分析方面。虽然现有的vlm在一般的视觉感知任务上表现出色,但它们往往无法满足地球科学的复杂需求,因为地球科学需要跨地理、社会和经济维度的综合城市分析。为了弥补这一差距,我们通过整合两个互补的来源:遥感(RS)和街景(SV)图像,扩大VLM的能力,以应对可持续城市发展的挑战。具体来说,我们首先设计了一个多视图视觉语言数据集(CitySet),其中包括20,589张RS图像,110万张SV图像和80万对问答。CitySet促进地理空间对象推理、社会对象分析、城市经济评估和可持续发展报告生成。此外,我们开发了CityVLM,通过地理空间和时间建模来整合宏观和微观层面的语义,而其语言建模组件生成详细的城市报告。我们在数据集上对10个先进的vlm进行了广泛的基准测试,发现最先进的模型难以完成城市分析任务,主要原因是领域差距和有限的多视图数据对齐能力。通过解决这些问题,CityVLM在所有任务中都取得了卓越的表现,并通过热岛效应监测等实际应用推进了自动化城市分析,为城市规划者和政策制定者的可持续发展工作提供了宝贵的工具。
{"title":"CityVLM: Towards sustainable urban development via multi-view coordinated vision–language model","authors":"Junjue Wang ,&nbsp;Weihao Xuan ,&nbsp;Heli Qi ,&nbsp;Zihang Chen ,&nbsp;Hongruixuan Chen ,&nbsp;Zhuo Zheng ,&nbsp;Junshi Xia ,&nbsp;Yanfei Zhong ,&nbsp;Naoto Yokoya","doi":"10.1016/j.isprsjprs.2025.11.030","DOIUrl":"10.1016/j.isprsjprs.2025.11.030","url":null,"abstract":"<div><div>Vision–language models (VLMs) have shown remarkable promise in Earth Vision, particularly in providing human-interpretable analysis of remote sensing imagery. While existing VLMs excel at general visual perception tasks, they often fall short in addressing the complex needs of geoscience, which requires comprehensive urban analysis across geographical, social, and economic dimensions. To bridge this gap, we expand VLM capabilities to tackle sustainable urban development challenges by integrating two complementary sources: remote sensing (RS) and street-view (SV) imagery. Specifically, we first design a multi-view vision–language dataset (<em>CitySet</em>), comprising 20,589 RS images, 1.1 million SV images, and 0.8 million question–answer pairs. <em>CitySet</em> facilitates geospatial object reasoning, social object analysis, urban economic assessment, and sustainable development report generation. Besides, we develop <em>CityVLM</em> to integrate macro- and micro-level semantics using geospatial and temporal modeling, while its language modeling component generates detailed urban reports. We extensively benchmarked 10 advanced VLMs on our dataset, revealing that state-of-the-art models struggle with urban analysis tasks, primarily due to domain gaps and limited multi-view data alignment capabilities. By addressing these issues, <em>CityVLM</em> achieves superior performance consistently across all tasks and advances automated urban analysis through practical applications like heat island effect monitoring, offering valuable tools for city planners and policymakers in their sustainability efforts.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 62-74"},"PeriodicalIF":12.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UrbanMMCL: Urban region representations via multi-modal and multi-graph self-supervised contrastive learning UrbanMMCL:基于多模态和多图自监督对比学习的城市区域表示
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-03 DOI: 10.1016/j.isprsjprs.2025.11.012
Jinzhou Cao , Jiashi Chen , Xiangxu Wang , Weiming Huang , Dongsheng Chen , Tianhong Zhao , Wei Tu , Qingquan Li
Urban region representation learning has emerged as a fundamental approach for diverse urban analytics tasks, where each neighborhood is encoded as a dense embedding vector for effective downstream applications. However, existing approaches suffer from insufficient multi-modal alignment and inadequate spatial relationship modeling, limiting their representation quality and generalizability. To address these challenges, we propose UrbanMMCL, a novel self-supervised framework that integrates multi-modal multi-view contrastive pre-training with unified fine-tuning for comprehensive urban representation learning. UrbanMMCL employs a dual-stage architecture. First, cross-modal contrastive learning aligns diverse data modalities including remote sensing imagery, street view imagery, location encodings, and Vision–Language Model (VLM)-generated textual descriptions. Second, multi-view adaptive graph contrastive learning captures complex spatial relationships across human mobility, functional similarity, and geographic distance perspectives. The framework then fine-tunes all parameters with the learned representations for effective adaptation to downstream tasks. Comprehensive experiments demonstrate that UrbanMMCL consistently outperforms state-of-the-art methods across pollutant emission prediction, population density estimation, and land use classification with minimal fine-tuning requirements, thereby advancing foundation model development for diverse Geo-AI applications.
城市区域表示学习已经成为各种城市分析任务的基本方法,其中每个社区被编码为有效的下游应用的密集嵌入向量。然而,现有的方法存在多模态对齐和空间关系建模不足的问题,限制了它们的表示质量和可泛化性。为了解决这些挑战,我们提出了一种新的自监督框架UrbanMMCL,它将多模态多视图对比预训练与统一微调集成在一起,用于综合城市表征学习。UrbanMMCL采用双阶段架构。首先,跨模态对比学习将不同的数据模态进行对齐,包括遥感图像、街景图像、位置编码和视觉语言模型(VLM)生成的文本描述。其次,多视图自适应图对比学习捕获了跨越人类移动性、功能相似性和地理距离视角的复杂空间关系。然后,该框架使用学习到的表示对所有参数进行微调,以有效地适应下游任务。综合实验表明,UrbanMMCL在污染物排放预测、人口密度估计和土地利用分类方面始终优于最先进的方法,且微调要求最低,从而推进了各种Geo-AI应用的基础模型开发。
{"title":"UrbanMMCL: Urban region representations via multi-modal and multi-graph self-supervised contrastive learning","authors":"Jinzhou Cao ,&nbsp;Jiashi Chen ,&nbsp;Xiangxu Wang ,&nbsp;Weiming Huang ,&nbsp;Dongsheng Chen ,&nbsp;Tianhong Zhao ,&nbsp;Wei Tu ,&nbsp;Qingquan Li","doi":"10.1016/j.isprsjprs.2025.11.012","DOIUrl":"10.1016/j.isprsjprs.2025.11.012","url":null,"abstract":"<div><div>Urban region representation learning has emerged as a fundamental approach for diverse urban analytics tasks, where each neighborhood is encoded as a dense embedding vector for effective downstream applications. However, existing approaches suffer from insufficient multi-modal alignment and inadequate spatial relationship modeling, limiting their representation quality and generalizability. To address these challenges, we propose UrbanMMCL, a novel self-supervised framework that integrates multi-modal multi-view contrastive pre-training with unified fine-tuning for comprehensive urban representation learning. UrbanMMCL employs a dual-stage architecture. First, cross-modal contrastive learning aligns diverse data modalities including remote sensing imagery, street view imagery, location encodings, and Vision–Language Model (VLM)-generated textual descriptions. Second, multi-view adaptive graph contrastive learning captures complex spatial relationships across human mobility, functional similarity, and geographic distance perspectives. The framework then fine-tunes all parameters with the learned representations for effective adaptation to downstream tasks. Comprehensive experiments demonstrate that UrbanMMCL consistently outperforms state-of-the-art methods across pollutant emission prediction, population density estimation, and land use classification with minimal fine-tuning requirements, thereby advancing foundation model development for diverse Geo-AI applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 75-93"},"PeriodicalIF":12.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping oil spills under varying sun glint conditions using a diffusion model with spatial-spectral-frequency constraints 利用具有空间-频谱-频率约束的扩散模型绘制不同太阳闪烁条件下的溢油图
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-02 DOI: 10.1016/j.isprsjprs.2025.11.032
Kai Du , Yi Ma , Zhongwei Li , Rongjie Liu , Zongchen Jiang , Junfang Yang
Marine oil spill detection via optical remote sensing is critically challenged by complex optical signatures induced by sun glint, which causes contrast reversals and masks the spectral features of different oil emulsion types. This study introduces a novel deep learning framework, the Spatial-Spectral Attention Conditioned Diffusion Probabilistic Model with Dual-branch Frequency Parser (OSS-Diff). The core architecture integrates two specialized modules. The first, a physics-informed Dual-branch Frequency Parser (DF-P) module, is designed to disentangle contrast variations by separately processing high-frequency edge features (characteristic of bright spills in strong glint) and low-frequency background information (typical of dark spills in weak glint). Concurrently, a systematically designed Spatial-Spectral Attention (SS-A) module targets spectral ambiguity by adaptively amplifying the discriminative features crucial for identifying oil emulsification states. Extensive experiments confirm the model’s superior performance. Under weak sun glint, it achieved F1-scores of 0.907 for non-emulsified and 0.862 for emulsified slicks. For positive-contrast spills under strong sun glint, it achieved an F1-score of 0.918. Ablation studies validate the synergistic effect of the proposed components, with the full model achieving a 4.9% F1-score gain over the baseline. Significantly, this work also reveals a potential link between the contrast inversion angle and the oil’s refractive index, suggesting a novel avenue for remotely characterizing the physical properties of oil spills. This research provides a robust framework for automated oil spill monitoring, offering a reliable model for environmental protection and emergency response.
由于太阳闪烁引起的复杂光学特征导致对比度逆转,并掩盖了不同油乳化液类型的光谱特征,因此通过光学遥感进行海洋溢油检测面临严峻挑战。本研究引入了一种新的深度学习框架——双分支频率解析器的空间-频谱注意条件扩散概率模型(ss - diff)。核心架构集成了两个专门的模块。第一个是物理信息的双分支频率解析器(DF-P)模块,通过分别处理高频边缘特征(强闪烁中明亮溢出的特征)和低频背景信息(弱闪烁中典型的暗溢出)来分离对比度变化。同时,系统设计的空间光谱注意(SS-A)模块通过自适应放大识别油乳化状态的关键判别特征来解决光谱模糊问题。大量的实验验证了该模型的优越性能。在弱光照条件下,未乳化油的f1得分为0.907,乳化油的f1得分为0.862。对于强阳光照射下的正对比溢出,其f1得分为0.918。消融研究验证了所提出的组件的协同效应,与基线相比,完整模型的f1评分提高了4.9%。值得注意的是,这项工作还揭示了对比反演角与油的折射率之间的潜在联系,为远程表征溢油的物理性质提供了一种新的途径。本研究为溢油自动化监测提供了一个健全的框架,为环境保护和应急响应提供了可靠的模型。
{"title":"Mapping oil spills under varying sun glint conditions using a diffusion model with spatial-spectral-frequency constraints","authors":"Kai Du ,&nbsp;Yi Ma ,&nbsp;Zhongwei Li ,&nbsp;Rongjie Liu ,&nbsp;Zongchen Jiang ,&nbsp;Junfang Yang","doi":"10.1016/j.isprsjprs.2025.11.032","DOIUrl":"10.1016/j.isprsjprs.2025.11.032","url":null,"abstract":"<div><div>Marine oil spill detection via optical remote sensing is critically challenged by complex optical signatures induced by sun glint, which causes contrast reversals and masks the spectral features of different oil emulsion types. This study introduces a novel deep learning framework, the Spatial-Spectral Attention Conditioned Diffusion Probabilistic Model with Dual-branch Frequency Parser (OSS-Diff). The core architecture integrates two specialized modules. The first, a physics-informed Dual-branch Frequency Parser (DF-P) module, is designed to disentangle contrast variations by separately processing high-frequency edge features (characteristic of bright spills in strong glint) and low-frequency background information (typical of dark spills in weak glint). Concurrently, a systematically designed Spatial-Spectral Attention (SS-A) module targets spectral ambiguity by adaptively amplifying the discriminative features crucial for identifying oil emulsification states. Extensive experiments confirm the model’s superior performance. Under weak sun glint, it achieved F1-scores of 0.907 for non-emulsified and 0.862 for emulsified slicks. For positive-contrast spills under strong sun glint, it achieved an F1-score of 0.918. Ablation studies validate the synergistic effect of the proposed components, with the full model achieving a 4.9% F1-score gain over the baseline. Significantly, this work also reveals a potential link between the contrast inversion angle and the oil’s refractive index, suggesting a novel avenue for remotely characterizing the physical properties of oil spills. This research provides a robust framework for automated oil spill monitoring, offering a reliable model for environmental protection and emergency response.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 48-61"},"PeriodicalIF":12.2,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145657554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2Player: A general framework for self-supervised change detection via cooperative learning 玩家:通过合作学习进行自我监督变化检测的一般框架
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-02 DOI: 10.1016/j.isprsjprs.2025.11.024
Manon Béchaz , Emanuele Dalsasso , Ciprian Tomoiagă , Marcin Detyniecki , Devis Tuia
While recent progress in deep learning have improved change detection (CD) in remote sensing, achieving high performance without labeled data remains an open challenge. Current models remain strongly reliant on large-scale annotated datasets, limiting their scalability to new regions and applications. Unsupervised CD methods offer a promising alternative but often suffer from limited representational ability and vulnerability to irrelevant changes in appearance such as seasonal variations. In this work, we propose the 2Player framework, a novel self-supervised method that transforms any existing CD architecture into an unsupervised model by leveraging a cooperation between a change detector and a reconstruction-based model. The two models, or players, guide each other during training: reconstruction errors provide supervision to the change detector, while change predictions guide the reconstruction process. To further improve robustness, we introduce a Geographical Correspondence Module that provides high-frequency structural information, effectively reducing false positives stemming from irrelevant changes in appearance. Furthermore, we propose a simple filtering strategy to mitigate the impact of label noise in CD datasets, contributing in an improved evaluation. We test 2Player on four very high-resolution datasets: HRSCD, for which we improve and release a new, cleaner set of labels, LEVIR-CD, WHU-CD, as well as a new dataset, ValaisCD. Our approach achieves state-of-the-art performance among unsupervised methods on these datasets, and with its architecture-agnostic design, provides a promising direction for bridging the gap between supervised and unsupervised change detection.
虽然深度学习的最新进展改善了遥感中的变化检测(CD),但在没有标记数据的情况下实现高性能仍然是一个开放的挑战。目前的模型仍然强烈依赖于大规模的带注释的数据集,限制了它们在新区域和应用中的可扩展性。无监督CD方法提供了一种很有前途的替代方法,但通常具有有限的表现能力,并且容易受到外观变化(如季节变化)的影响。在这项工作中,我们提出了2Player框架,这是一种新颖的自监督方法,通过利用变化检测器和基于重建的模型之间的合作,将任何现有的CD架构转换为无监督模型。这两个模型或玩家在训练过程中相互指导:重建错误为变化检测器提供监督,而变化预测指导重建过程。为了进一步提高鲁棒性,我们引入了一个提供高频结构信息的地理对应模块,有效地减少了由不相关的外观变化引起的误报。此外,我们提出了一种简单的过滤策略来减轻CD数据集中标签噪声的影响,有助于改进评估。我们在四个非常高分辨率的数据集上测试2Player: HRSCD,为此我们改进并发布了一个新的,更干净的标签集,LEVIR-CD, WHU-CD,以及一个新的数据集,ValaisCD。我们的方法在这些数据集上的无监督方法中实现了最先进的性能,并且其架构不可知的设计为弥合监督和无监督变化检测之间的差距提供了一个有希望的方向。
{"title":"2Player: A general framework for self-supervised change detection via cooperative learning","authors":"Manon Béchaz ,&nbsp;Emanuele Dalsasso ,&nbsp;Ciprian Tomoiagă ,&nbsp;Marcin Detyniecki ,&nbsp;Devis Tuia","doi":"10.1016/j.isprsjprs.2025.11.024","DOIUrl":"10.1016/j.isprsjprs.2025.11.024","url":null,"abstract":"<div><div>While recent progress in deep learning have improved change detection (CD) in remote sensing, achieving high performance without labeled data remains an open challenge. Current models remain strongly reliant on large-scale annotated datasets, limiting their scalability to new regions and applications. Unsupervised CD methods offer a promising alternative but often suffer from limited representational ability and vulnerability to irrelevant changes in appearance such as seasonal variations. In this work, we propose the 2Player framework, a novel self-supervised method that transforms any existing CD architecture into an unsupervised model by leveraging a cooperation between a change detector and a reconstruction-based model. The two models, or players, guide each other during training: reconstruction errors provide supervision to the change detector, while change predictions guide the reconstruction process. To further improve robustness, we introduce a Geographical Correspondence Module that provides high-frequency structural information, effectively reducing false positives stemming from irrelevant changes in appearance. Furthermore, we propose a simple filtering strategy to mitigate the impact of label noise in CD datasets, contributing in an improved evaluation. We test 2Player on four very high-resolution datasets: HRSCD, for which we improve and release a new, cleaner set of labels, LEVIR-CD, WHU-CD, as well as a new dataset, ValaisCD. Our approach achieves state-of-the-art performance among unsupervised methods on these datasets, and with its architecture-agnostic design, provides a promising direction for bridging the gap between supervised and unsupervised change detection.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 34-47"},"PeriodicalIF":12.2,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145657560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable spatiotemporal deep learning for subseasonal super-resolution forecasting of Arctic sea ice concentration during the melting season 基于可解释时空深度学习的北极海冰融化季亚季节超分辨率预报
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-01 DOI: 10.1016/j.isprsjprs.2025.11.027
Jianxin He , Yuxin Zhao , Shuo Yang , Woping Wu , Jian Wang , Xiong Deng
Accurate, high-resolution forecasting of Arctic sea-ice concentration (SIC) during the melting season is crucial for climate monitoring and polar navigation, yet remains hindered by the system’s complex, multi-scale, and cross-sphere dynamics. We present MSS-STFormer, an explainable multi-scale spatiotemporal Transformer designed for subseasonal SIC super-resolution forecasting. The model integrates 14 environmental factors spanning the ice, ocean, and atmosphere, and incorporates four specialized modules to enhance spatiotemporal representation and physical consistency. Trained with OSTIA satellite observations and ERA5 reanalysis data, MSS-STFormer achieves high forecasting skill over a 60-day horizon, yielding an RMSE of 0.049, a correlation of 0.9951, an SSIM of 0.9603, and a BACC of 0.9656. Post-hoc explainability methods, Gradient SHAP and LIME—reveal that the model captures a temporally evolving prediction mechanism: early forecasts are dominated by persistence of initial conditions, mid-term phases are governed by atmospheric dynamics such as wind and pressure, and later stages transition to a coupled influence of radiative and dynamic processes. This progression aligns closely with established thermodynamic and dynamic theories of sea-ice evolution, underscoring the model’s ability to identify physically meaningful drivers. The framework demonstrates strong potential for advancing explainable GeoAI in Earth observation, combining predictive accuracy with physical explainability for operational Arctic SIC monitoring and climate applications.
融冰季北极海冰浓度(SIC)的准确、高分辨率预报对气候监测和极地导航至关重要,但仍受到系统复杂、多尺度和跨球体动力学的阻碍。我们提出了MSS-STFormer,一个可解释的多尺度时空转换器,设计用于亚季节SIC超分辨率预测。该模型集成了14个环境因子,包括冰、海洋和大气,并结合了四个专门的模块,以增强时空表征和物理一致性。在OSTIA卫星观测数据和ERA5再分析数据的训练下,MSS-STFormer在60天范围内具有较高的预测能力,RMSE为0.049,相关系数为0.9951,SSIM为0.9603,BACC为0.9656。事后可解释性方法、Gradient SHAP和lime表明,该模式捕获了一种时间演化的预测机制:早期预报受初始条件的持续影响,中期阶段受风和气压等大气动力学的影响,后期阶段过渡到辐射和动力过程的耦合影响。这一进展与已建立的海冰演化热力学和动力学理论密切相关,强调了该模型识别物理上有意义的驱动因素的能力。该框架展示了在地球观测中推进可解释GeoAI的强大潜力,将北极SIC监测和气候应用的预测精度与物理可解释性相结合。
{"title":"Explainable spatiotemporal deep learning for subseasonal super-resolution forecasting of Arctic sea ice concentration during the melting season","authors":"Jianxin He ,&nbsp;Yuxin Zhao ,&nbsp;Shuo Yang ,&nbsp;Woping Wu ,&nbsp;Jian Wang ,&nbsp;Xiong Deng","doi":"10.1016/j.isprsjprs.2025.11.027","DOIUrl":"10.1016/j.isprsjprs.2025.11.027","url":null,"abstract":"<div><div>Accurate, high-resolution forecasting of Arctic sea-ice concentration (SIC) during the melting season is crucial for climate monitoring and polar navigation, yet remains hindered by the system’s complex, multi-scale, and cross-sphere dynamics. We present MSS-STFormer, an explainable multi-scale spatiotemporal Transformer designed for subseasonal SIC super-resolution forecasting. The model integrates 14 environmental factors spanning the ice, ocean, and atmosphere, and incorporates four specialized modules to enhance spatiotemporal representation and physical consistency. Trained with OSTIA satellite observations and ERA5 reanalysis data, MSS-STFormer achieves high forecasting skill over a 60-day horizon, yielding an RMSE of 0.049, a correlation of 0.9951, an SSIM of 0.9603, and a BACC of 0.9656. Post-hoc explainability methods, Gradient SHAP and LIME—reveal that the model captures a temporally evolving prediction mechanism: early forecasts are dominated by persistence of initial conditions, mid-term phases are governed by atmospheric dynamics such as wind and pressure, and later stages transition to a coupled influence of radiative and dynamic processes. This progression aligns closely with established thermodynamic and dynamic theories of sea-ice evolution, underscoring the model’s ability to identify physically meaningful drivers. The framework demonstrates strong potential for advancing explainable GeoAI in Earth observation, combining predictive accuracy with physical explainability for operational Arctic SIC monitoring and climate applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 1-17"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
National mapping of wetland vegetation leaf area index in China using hybrid model with Sentinel-2 and Landsat-8 data 基于Sentinel-2和Landsat-8数据混合模型的中国湿地植被叶面积指数全国制图
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-01 DOI: 10.1016/j.isprsjprs.2025.11.031
Jianing Zhen , Dehua Mao , Yeqiao Wang , Junjie Wang , Chenwei Nie , Shiqi Huo , Hengxing Xiang , Yongxing Ren , Ling Luo , Zongming Wang
Leaf area index (LAI) of wetland vegetation provides vital information for its growth condition, structure and functioning. Accurately mapping LAI at a broad scale is essential for conservation and rehabilitation of wetland ecosystem. However, owing to the spatial complexity and periodic inundation characteristics of wetland vegetation, retrieving LAI of wetlands remains a challenging task with significant uncertainty. Here, with 865 in-situ measurements across different wetland biomes in China during 2013–2023, we proposed a hybrid strategy that incorporated active learning (AL) technique, physically-based PROSAIL-5B model, and Random Forest machine learning algorithm to map wetland biomes LAI across China from Sentinel-2 and Landsat-8 imagery. The validation results showed that the hybrid approach outperformed physically-based and empirically-based methods and achieved higher accuracy (R2 increased by 0.15–0.40, RMSE decreased by 0.02–0.27, and RRMSE reduced by 3.37–12.78 %). Additionally, three indices that we newly-developed (TBVI5, TBVI3 and TBVI1) exhibited superior potential for LAI inversion across different types of wetland vegetation. Our mapping results exhibited spatial details and consistency, and matched with in-situ observations from Sentinel-2 compared to Landsat-8 and the other MODIS-based products. In this study, we developed the first national-scale mapping of wetland vegetation LAI in China. The findings offer insights into accurate retrieval of LAI in wetland vegetation, providing valuable support for the scientific restoration of wetlands and assessing their responses to climate change.
湿地植被叶面积指数(LAI)是湿地植被生长状况、结构和功能的重要信息。在大尺度上准确绘制LAI对湿地生态系统的保护和恢复至关重要。然而,由于湿地植被的空间复杂性和周期性淹没特征,湿地LAI的反演仍然是一项具有挑战性的任务,具有很大的不确定性。本文基于2013-2023年中国不同湿地生物群落的865个原位测量数据,提出了一种结合主动学习(AL)技术、基于物理的PROSAIL-5B模型和随机森林机器学习算法的混合策略,利用Sentinel-2和Landsat-8图像绘制中国湿地生物群落LAI。验证结果表明,混合方法优于物理方法和经验方法,获得了更高的准确率(R2提高0.15 ~ 0.40,RMSE降低0.02 ~ 0.27,RRMSE降低3.37 ~ 12.78%)。此外,我们新开发的3个指数TBVI5、TBVI3和TBVI1在不同类型湿地植被的LAI反演中表现出较好的潜力。与Landsat-8和其他基于modis的产品相比,我们的制图结果显示了空间细节和一致性,并且与Sentinel-2的原位观测结果相匹配。在这项研究中,我们开发了中国第一个国家尺度的湿地植被LAI制图。研究结果为湿地植被LAI的精确反演提供了新的思路,为湿地的科学恢复和评估其对气候变化的响应提供了有价值的支持。
{"title":"National mapping of wetland vegetation leaf area index in China using hybrid model with Sentinel-2 and Landsat-8 data","authors":"Jianing Zhen ,&nbsp;Dehua Mao ,&nbsp;Yeqiao Wang ,&nbsp;Junjie Wang ,&nbsp;Chenwei Nie ,&nbsp;Shiqi Huo ,&nbsp;Hengxing Xiang ,&nbsp;Yongxing Ren ,&nbsp;Ling Luo ,&nbsp;Zongming Wang","doi":"10.1016/j.isprsjprs.2025.11.031","DOIUrl":"10.1016/j.isprsjprs.2025.11.031","url":null,"abstract":"<div><div>Leaf area index (LAI) of wetland vegetation provides vital information for its growth condition, structure and functioning. Accurately mapping LAI at a broad scale is essential for conservation and rehabilitation of wetland ecosystem. However, owing to the spatial complexity and periodic inundation characteristics of wetland vegetation, retrieving LAI of wetlands remains a challenging task with significant uncertainty. Here, with 865 in-situ measurements across different wetland biomes in China during 2013–2023, we proposed a hybrid strategy that incorporated active learning (AL) technique, physically-based PROSAIL-5B model, and Random Forest machine learning algorithm to map wetland biomes LAI across China from Sentinel-2 and Landsat-8 imagery. The validation results showed that the hybrid approach outperformed physically-based and empirically-based methods and achieved higher accuracy (R<sup>2</sup> increased by 0.15–0.40, RMSE decreased by 0.02–0.27, and RRMSE reduced by 3.37–12.78 %). Additionally, three indices that we newly-developed (TBVI5, TBVI3 and TBVI1) exhibited superior potential for LAI inversion across different types of wetland vegetation. Our mapping results exhibited spatial details and consistency, and matched with in-situ observations from Sentinel-2 compared to Landsat-8 and the other MODIS-based products. In this study, we developed the first national-scale mapping of wetland vegetation LAI in China. The findings offer insights into accurate retrieval of LAI in wetland vegetation, providing valuable support for the scientific restoration of wetlands and assessing their responses to climate change.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 18-33"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145658151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised despeckling based solely on SAR intensity images: A general strategy 基于SAR强度图像的自监督去噪:一种通用策略
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2025-12-01 DOI: 10.1016/j.isprsjprs.2025.11.025
Liang Chen , Yifei Yin , Hao Shi , Jingfei He , Wei Li
Speckle noise is generated along with the SAR imaging mechanism and degrades the quality of SAR images, leading to difficult interpretation. Hence, despeckling is an indispensable step in SAR pre-processing. Fortunately, supervised learning (SL) has proven to be a progressive method for SAR image despeckling. SL methods necessitate the availability of both original SAR images and their speckle-free counterparts during training, whilst speckle-free SAR images do not exist in the real world. Even though there are several substitutes for speckle-free images, the domain gap leads to poor performance and adaptability. Self-supervision provides an approach to training without clean reference. However, most self-supervised methods introduce additional requirements on speckle modeling or specific data, posing challenges in real-world applications. To address these challenges, we propose a general Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training. Firstly, the theoretical feasibility of SAR image despeckling without speckle-free images is established. A self-supervised despeckling criteria suitable for diverse SAR images is proposed. Subsequently, a Random-Aware sub-SAMpler with Projection correLation Estimation (RA-SAMPLE) is put forth. Mutually independent training pairs can be derived from actual SAR intensity images. Furthermore, a multi-feature loss function is introduced, consisting of a despeckling term, a regularization term, and a perception term. The performance of speckle suppression and texture preservation is well-balanced. Experiments reveal that the proposed method performs comparably to supervised approaches on synthetic data and outperforms them on actual data. Both visual and quantitative evaluations confirm its superiority over state-of-the-art despeckling techniques. Moreover, the results demonstrates that SDS-SAR provides a novel solution for noise suppression in other multiplicative coherent systems. The trained model and dataset will be available at https://github.com/YYF121/SDS-SAR.
伴随着SAR成像机制产生的散斑噪声会降低SAR图像的质量,导致解译困难。因此,去噪是SAR预处理中不可缺少的步骤。幸运的是,监督学习(SL)已被证明是一种渐进的SAR图像去斑方法。SL方法需要在训练过程中同时获得原始SAR图像和无斑点图像,而无斑点的SAR图像在现实世界中是不存在的。尽管有几种无斑点图像的替代品,但由于域间隙的存在,导致其性能和适应性较差。自我监督提供了一种无需明确参考的培训方法。然而,大多数自监督方法对散斑建模或特定数据提出了额外的要求,在实际应用中提出了挑战。为了解决这些挑战,我们提出了一种通用的自监督SAR图像去斑策略(SDS-SAR),该策略仅依赖于斑点强度数据进行训练。首先,建立了SAR图像无斑点去斑的理论可行性;提出了一种适用于不同SAR图像的自监督去斑准则。随后,提出了一种带有投影相关估计的随机感知子采样器(RA-SAMPLE)。相互独立的训练对可以从实际的SAR强度图像中得到。在此基础上,引入了一个多特征损失函数,该函数由去斑项、正则化项和感知项组成。斑点抑制性能和纹理保持性能很好。实验表明,该方法在合成数据上的性能与监督方法相当,在实际数据上的性能优于监督方法。视觉和定量评估都证实了它优于最先进的消斑技术。此外,结果表明,SDS-SAR为其他乘性相干系统的噪声抑制提供了一种新的解决方案。经过训练的模型和数据集将在https://github.com/YYF121/SDS-SAR上提供。
{"title":"Self-supervised despeckling based solely on SAR intensity images: A general strategy","authors":"Liang Chen ,&nbsp;Yifei Yin ,&nbsp;Hao Shi ,&nbsp;Jingfei He ,&nbsp;Wei Li","doi":"10.1016/j.isprsjprs.2025.11.025","DOIUrl":"10.1016/j.isprsjprs.2025.11.025","url":null,"abstract":"<div><div>Speckle noise is generated along with the SAR imaging mechanism and degrades the quality of SAR images, leading to difficult interpretation. Hence, despeckling is an indispensable step in SAR pre-processing. Fortunately, supervised learning (SL) has proven to be a progressive method for SAR image despeckling. SL methods necessitate the availability of both original SAR images and their speckle-free counterparts during training, whilst speckle-free SAR images do not exist in the real world. Even though there are several substitutes for speckle-free images, the domain gap leads to poor performance and adaptability. Self-supervision provides an approach to training without clean reference. However, most self-supervised methods introduce additional requirements on speckle modeling or specific data, posing challenges in real-world applications. To address these challenges, we propose a general Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training. Firstly, the theoretical feasibility of SAR image despeckling without speckle-free images is established. A self-supervised despeckling criteria suitable for diverse SAR images is proposed. Subsequently, a Random-Aware sub-SAMpler with Projection correLation Estimation (RA-SAMPLE) is put forth. Mutually independent training pairs can be derived from actual SAR intensity images. Furthermore, a multi-feature loss function is introduced, consisting of a despeckling term, a regularization term, and a perception term. The performance of speckle suppression and texture preservation is well-balanced. Experiments reveal that the proposed method performs comparably to supervised approaches on synthetic data and outperforms them on actual data. Both visual and quantitative evaluations confirm its superiority over state-of-the-art despeckling techniques. Moreover, the results demonstrates that SDS-SAR provides a novel solution for noise suppression in other multiplicative coherent systems. The trained model and dataset will be available at <span><span>https://github.com/YYF121/SDS-SAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 854-873"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145657555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1