Pub Date : 2026-01-15DOI: 10.1016/j.isprsjprs.2026.01.016
Zhi-Wei He , Bo-Hui Tang , Zhao-Liang Li
<div><div>Mountainous land surface temperature (MLST) is a key parameter for studying the energy exchange between land surface and atmosphere in mountainous areas. However, traditional land surface temperature (LST) retrieval methods often neglect the influence of three-dimensional (3D) structures and adjacent pixels due to rugged terrain. To address this, a mountainous split-window and temperature-emissivity separation (MSW-TES) hybrid algorithm was proposed to retrieve MLST. The hybrid algorithm that combines the improved split window (SW) algorithm and temperature-emissivity separation (TES) algorithm, which considering the topographic and adjacent effects (T-A effect) to retrieve MLST from five thermal infrared (TIR) bands of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). In this hybrid algorithm, an improved mountainous canopy multiple scattering TIR radiative transfer model was proposed to construct the simulation dataset. Then, an improved SW algorithm was developed to build a 3D lookup table (LUT) of regression coefficients using small-scale self-heating parameter (SSP) and sky-view factor (SVF) to estimate brightness temperature (BT) at ground level. Furthermore, The TES algorithm was refined to account for the influence of rugged terrain within pixel on mountainous land surface effective emissivity (MLSE) by reconstructing the relationship between minimum emissivity and maximum-minimum difference (MMD) for different SSPs. Results from simulated data show that the accuracy of the improved SW algorithm is increased by up to 0.5 K at most for estimating BT at ground level. The MSW-TES algorithm, considering the T-A effect, generally retrieves lower LST values compared to those without this consideration. The hybrid algorithm yielded root mean square error (RMSE) of 0.99 K and 1.83 K for LST retrieval with and without the T-A effect, respectively, with most differences falling between 0.0 K and 3.0 K. The sensitivity analysis indicated that the perturbation of input parameters has little influence on MLST and MLSE, which proves that the MSW-TES algorithm has strong robustness. Additionally, the accuracy of MLST retrieval by the MSW-TES algorithm was validated using both discrete anisotropic radiative transfer (DART) model simulations and <em>in-situ</em> measurements. The validation result of DART simulations showed biases ranging from −0.13 K to 1.03 K and RMSEs from 0.76 K to 1.29 K across the five ASTER TIR bands, while validation result of the in-situ measurements yielded a bias of 0.97 K and an RMSE of 1.25 K, demonstrating consistent and reliable results. This study underscores the necessity of accounting for the T-A effect to improve MLST retrieval and provides a promising pathway for global clear-sky high-resolution MLST mapping in upcoming thermal missions. The source code and simulated data are available at <span><span>https://github.com/hezwppp/MSW-TES</span><svg><path></path></svg></span>.</div></div
山地地表温度(MLST)是研究山区地表与大气能量交换的关键参数。然而,由于地形起伏,传统的地表温度反演方法往往忽略了三维结构及其相邻像元的影响。为了解决这一问题,提出了一种山地分割窗和温度发射率分离(MSW-TES)混合算法来检索MLST。结合改进的分割窗(SW)算法和温度-发射率分离(TES)算法,考虑地形和相邻效应(T-A效应),从先进星载热发射与反射辐射计(ASTER)的5个热红外(TIR)波段提取MLST。在该混合算法中,提出了一种改进的山地冠层多重散射TIR辐射传输模型来构建模拟数据集。然后,开发了一种改进的SW算法,利用小尺度自热参数(SSP)和天景因子(SVF)建立回归系数的三维查找表(LUT),估算地面亮度温度(BT)。此外,通过重构不同ssp的最小发射率与最大最小差(MMD)之间的关系,对TES算法进行了改进,以考虑像元内崎岖地形对山地地表有效发射率(MLSE)的影响。仿真结果表明,改进的SW算法对地面BT的估计精度最高可提高0.5 K。考虑T-A效应的MSW-TES算法通常比不考虑T-A效应的算法检索到更低的LST值。混合算法对有无T-A效应的LST检索结果的均方根误差(RMSE)分别为0.99 K和1.83 K,最大差异在0.0 K和3.0 K之间。灵敏度分析表明,输入参数的扰动对MLST和MLSE的影响较小,证明了MSW-TES算法具有较强的鲁棒性。此外,通过离散各向异性辐射传输(DART)模型模拟和现场测量,验证了MSW-TES算法检索MLST的准确性。DART模拟验证结果显示,5个ASTER TIR波段的偏差范围为- 0.13 K ~ 1.03 K,均方根误差范围为0.76 K ~ 1.29 K,而原位测量验证结果的偏差为0.97 K,均方根误差为1.25 K,结果一致可靠。该研究强调了考虑T-A效应对改进MLST检索的必要性,并为未来热成像任务中全球晴空高分辨率MLST制图提供了一条有希望的途径。源代码和模拟数据可在https://github.com/hezwppp/MSW-TES上获得。
{"title":"An SW-TES hybrid algorithm for retrieving mountainous land surface temperature from high-resolution thermal infrared remote sensing data","authors":"Zhi-Wei He , Bo-Hui Tang , Zhao-Liang Li","doi":"10.1016/j.isprsjprs.2026.01.016","DOIUrl":"10.1016/j.isprsjprs.2026.01.016","url":null,"abstract":"<div><div>Mountainous land surface temperature (MLST) is a key parameter for studying the energy exchange between land surface and atmosphere in mountainous areas. However, traditional land surface temperature (LST) retrieval methods often neglect the influence of three-dimensional (3D) structures and adjacent pixels due to rugged terrain. To address this, a mountainous split-window and temperature-emissivity separation (MSW-TES) hybrid algorithm was proposed to retrieve MLST. The hybrid algorithm that combines the improved split window (SW) algorithm and temperature-emissivity separation (TES) algorithm, which considering the topographic and adjacent effects (T-A effect) to retrieve MLST from five thermal infrared (TIR) bands of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). In this hybrid algorithm, an improved mountainous canopy multiple scattering TIR radiative transfer model was proposed to construct the simulation dataset. Then, an improved SW algorithm was developed to build a 3D lookup table (LUT) of regression coefficients using small-scale self-heating parameter (SSP) and sky-view factor (SVF) to estimate brightness temperature (BT) at ground level. Furthermore, The TES algorithm was refined to account for the influence of rugged terrain within pixel on mountainous land surface effective emissivity (MLSE) by reconstructing the relationship between minimum emissivity and maximum-minimum difference (MMD) for different SSPs. Results from simulated data show that the accuracy of the improved SW algorithm is increased by up to 0.5 K at most for estimating BT at ground level. The MSW-TES algorithm, considering the T-A effect, generally retrieves lower LST values compared to those without this consideration. The hybrid algorithm yielded root mean square error (RMSE) of 0.99 K and 1.83 K for LST retrieval with and without the T-A effect, respectively, with most differences falling between 0.0 K and 3.0 K. The sensitivity analysis indicated that the perturbation of input parameters has little influence on MLST and MLSE, which proves that the MSW-TES algorithm has strong robustness. Additionally, the accuracy of MLST retrieval by the MSW-TES algorithm was validated using both discrete anisotropic radiative transfer (DART) model simulations and <em>in-situ</em> measurements. The validation result of DART simulations showed biases ranging from −0.13 K to 1.03 K and RMSEs from 0.76 K to 1.29 K across the five ASTER TIR bands, while validation result of the in-situ measurements yielded a bias of 0.97 K and an RMSE of 1.25 K, demonstrating consistent and reliable results. This study underscores the necessity of accounting for the T-A effect to improve MLST retrieval and provides a promising pathway for global clear-sky high-resolution MLST mapping in upcoming thermal missions. The source code and simulated data are available at <span><span>https://github.com/hezwppp/MSW-TES</span><svg><path></path></svg></span>.</div></div","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 865-889"},"PeriodicalIF":12.2,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.isprsjprs.2026.01.002
Jing Yang , Yanfeng Wen , Peng Chen , Zhenhua Zhang , Delu Pan
Satellite-based ocean remote sensing is fundamentally limited to observing the ocean surface (top-of-the-ocean), a constraint that severely hinders a comprehensive understanding of how the entire water column ecosystem responds to climate variability like the El Niño-Southern Oscillation (ENSO). Surface-only views cannot resolve critical shifts in the subsurface chlorophyll maximum (SCM), a key layer for marine biodiversity and biogeochemical cycles. To overcome this critical limitation, we develop and validate a novel stacked generalization ensemble machine learning framework. This framework robustly reconstructs a 25-year (1998–2022) high-resolution 3D chlorophyll-a (Chl-a) field by integrating 133,792 globally distributed Biogeochemical-Argo (BGC-Argo) profiles with multi-source satellite data. The reconstructed 3D Chl-a fields were rigorously validated against both satellite and in-situ observations, achieving strong agreement (R ≥ 0.97, mean absolute percentage error ≤ 27 %), demonstrating the robustness and reliability of the framework. Applying this framework to two contrasting South China Sea upwelling system reveals that ENSO phases fundamentally restructure the entire water column. Crucially, we discover that El Niño and La Niña exert opposing effects on the SCM: El Niño events deepen and thin the SCM with decreasing Chl-a by 15–30 %, whereas La Niña events cause it to shoal and thicken, increasing Chl-a by 20–40 %. This vertical restructuring is mechanistically linked to ENSO-driven changes in wind stress curl, Rossby wave propagation, and nitrate availability. Furthermore, we identify a significant subsurface-first response, where the SCM reacts to ENSO forcing months before significant changes are detectable at the surface. Our findings demonstrate that a three-dimensional perspective, enabled by our novel remote sensing reconstruction framework, is essential for accurately quantifying the biogeochemical consequences of climate variability, revealing that surface-only observations can significantly underestimate the vulnerability and response of marine ecosystems to ENSO events.
{"title":"Beyond the surface: machine learning uncovers ENSO’s hidden and contrasting impacts on phytoplankton vertical structure","authors":"Jing Yang , Yanfeng Wen , Peng Chen , Zhenhua Zhang , Delu Pan","doi":"10.1016/j.isprsjprs.2026.01.002","DOIUrl":"10.1016/j.isprsjprs.2026.01.002","url":null,"abstract":"<div><div>Satellite-based ocean remote sensing is fundamentally limited to observing the ocean surface (top-of-the-ocean), a constraint that severely hinders a comprehensive understanding of how the entire water column ecosystem responds to climate variability like the El Niño-Southern Oscillation (ENSO). Surface-only views cannot resolve critical shifts in the subsurface chlorophyll maximum (SCM), a key layer for marine biodiversity and biogeochemical cycles. To overcome this critical limitation, we develop and validate a novel stacked generalization ensemble machine learning framework. This framework robustly reconstructs a 25-year (1998–2022) high-resolution 3D chlorophyll-a (Chl-a) field by integrating 133,792 globally distributed Biogeochemical-Argo (BGC-Argo) profiles with multi-source satellite data. The reconstructed 3D Chl-a fields were rigorously validated against both satellite and in-situ observations, achieving strong agreement (R ≥ 0.97, mean absolute percentage error ≤ 27 %), demonstrating the robustness and reliability of the framework. Applying this framework to two contrasting South China Sea upwelling system reveals that ENSO phases fundamentally restructure the entire water column. Crucially, we discover that El Niño and La Niña exert opposing effects on the SCM: El Niño events deepen and thin the SCM with decreasing Chl-a by 15–30 %, whereas La Niña events cause it to shoal and thicken, increasing Chl-a by 20–40 %. This vertical restructuring is mechanistically linked to ENSO-driven changes in wind stress curl, Rossby wave propagation, and nitrate availability. Furthermore, we identify a significant subsurface-first response, where the SCM reacts to ENSO forcing months before significant changes are detectable at the surface. Our findings demonstrate that a three-dimensional perspective, enabled by our novel remote sensing reconstruction framework, is essential for accurately quantifying the biogeochemical consequences of climate variability, revealing that surface-only observations can significantly underestimate the vulnerability and response of marine ecosystems to ENSO events.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 890-909"},"PeriodicalIF":12.2,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.isprsjprs.2026.01.005
Yue Zhou , Jue Chen , Zilun Zhang , Penghui Huang , Ran Ding , Zhentao Zou , PengFei Gao , Yuchen Wei , Ke Li , Xue Yang , Xue Jiang , Hongxin Yang , Jonathan Li
Remote sensing (RS) large vision–language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referring expressions – such as relative position, relative size, and color cues – thereby constraining performance on implicit VG tasks that require scenario-specific domain knowledge. This article introduces DVGBench, a high-quality implicit VG benchmark for drones, covering six major application scenarios: traffic, disaster, security, sport, social activity, and productive activity. Each object provides both explicit and implicit queries. Based on the dataset, we design DroneVG-R1, an LVLM that integrates the novel Implicit-to-Explicit Chain-of-Thought (I2E-CoT) within a reinforcement learning paradigm. This enables the model to take advantage of scene-specific expertise, converting implicit references into explicit ones and thus reducing grounding difficulty. Finally, an evaluation of mainstream models on both explicit and implicit VG tasks reveals substantial limitations in their reasoning capabilities. These findings provide actionable insights for advancing the reasoning capacity of LVLMs for drone-based agents. The code and datasets will be released at https://github.com/zytx121/DVGBench.
{"title":"DVGBench: Implicit-to-explicit visual grounding benchmark in UAV imagery with large vision–language models","authors":"Yue Zhou , Jue Chen , Zilun Zhang , Penghui Huang , Ran Ding , Zhentao Zou , PengFei Gao , Yuchen Wei , Ke Li , Xue Yang , Xue Jiang , Hongxin Yang , Jonathan Li","doi":"10.1016/j.isprsjprs.2026.01.005","DOIUrl":"10.1016/j.isprsjprs.2026.01.005","url":null,"abstract":"<div><div>Remote sensing (RS) large vision–language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referring expressions – such as relative position, relative size, and color cues – thereby constraining performance on implicit VG tasks that require scenario-specific domain knowledge. This article introduces DVGBench, a high-quality implicit VG benchmark for drones, covering six major application scenarios: traffic, disaster, security, sport, social activity, and productive activity. Each object provides both explicit and implicit queries. Based on the dataset, we design DroneVG-R1, an LVLM that integrates the novel Implicit-to-Explicit Chain-of-Thought (I2E-CoT) within a reinforcement learning paradigm. This enables the model to take advantage of scene-specific expertise, converting implicit references into explicit ones and thus reducing grounding difficulty. Finally, an evaluation of mainstream models on both explicit and implicit VG tasks reveals substantial limitations in their reasoning capabilities. These findings provide actionable insights for advancing the reasoning capacity of LVLMs for drone-based agents. The code and datasets will be released at <span><span>https://github.com/zytx121/DVGBench</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 831-847"},"PeriodicalIF":12.2,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.isprsjprs.2026.01.015
Peng Qin , Huabing Huang , Jie Wang , Yunxia Cui , Peimin Chen , Shuang Chen , Yu Xia , Shuai Yuan , Yumei Li , Xiangyu Liu
Large-scale, long-term, and high-frequency monitoring of forest cover is essential for sustainable forest management and carbon stock assessment. However, in persistently cloudy regions such as southern China, the scarcity of high-quality remote sensing data and reliable training samples has resulted in forest cover products with limited spatial and temporal resolution. In addition, many existing datasets fail to accurately characterize forest distribution and dynamics—particularly underestimating forest expansion and overlooking fine-scale and high-frequency changes. To address these limitations, we propose a novel forest–non-forest mapping framework based on reconstructed remote sensing data. First, we successfully achieved large-scale data reconstruction using two deep learning-based multi-sensor fusion methods across extensive (2.04 million km2), long-term (2000–2020), persistently cloudy regions, effectively generating seamless imagery and NDVI time series to address extensive spatial and temporal data gaps for forest classification. Next, by combining spectrally similar sample transfer method with existing land cover products, we constructed robust training samples spanning broad spatial and temporal scales. Subsequently, using a random forest classifier we generated annual 30 m forest cover maps for cloudy southern China, achieving an unprecedented balance between spatial and temporal resolution while improving mapping accuracy. The results demonstrate an overall accuracy of 0.904, surpassing that of the China Land Cover Dataset (CLCD, 0.889) and the China Annual Tree Cover Dataset (CATCD, 0.850). Particularly, our results revealed an overall upward trend in forest area—from 119.84 to 132.09 million hectares (Mha)—that was rarely captured in previous studies, closely aligning with National Forest Inventory (NFI) data (R2 = 0.86). Finally, by integrating time-series analysis with classification results, this study transformed forest mapping from a traditional static framework to a dynamic temporal perspective, reducing uncertainties associated with direct interannual comparisons and estimating forest gains of 23.87 Mha and losses of 12.56 Mha. Notably, reconstructed data improved forest mapping in terms of completeness, resolution, and accuracy. In Guangxi, the annual product detected 11.24 Mha more forest gain than the 10-year composite, indicating better completeness. It also offered finer spatial resolution (30 m vs. 500 m) and higher overall accuracy (0.879 vs. 0.853), compared to the widely used cloud-affected annual product. Overall, this study presents a robust framework for precise forest monitoring in cloudy regions.
{"title":"Unveiling spatiotemporal forest cover patterns breaking the cloud barrier: Annual 30 m mapping in cloud-prone southern China from 2000 to 2020","authors":"Peng Qin , Huabing Huang , Jie Wang , Yunxia Cui , Peimin Chen , Shuang Chen , Yu Xia , Shuai Yuan , Yumei Li , Xiangyu Liu","doi":"10.1016/j.isprsjprs.2026.01.015","DOIUrl":"10.1016/j.isprsjprs.2026.01.015","url":null,"abstract":"<div><div>Large-scale, long-term, and high-frequency monitoring of forest cover is essential for sustainable forest management and carbon stock assessment. However, in persistently cloudy regions such as southern China, the scarcity of high-quality remote sensing data and reliable training samples has resulted in forest cover products with limited spatial and temporal resolution. In addition, many existing datasets fail to accurately characterize forest distribution and dynamics—particularly underestimating forest expansion and overlooking fine-scale and high-frequency changes. To address these limitations, we propose a novel forest–non-forest mapping framework based on reconstructed remote sensing data. First, we successfully achieved large-scale data reconstruction using two deep learning-based multi-sensor fusion methods across extensive (2.04 million km<sup>2</sup>), long-term (2000–2020), persistently cloudy regions, effectively generating seamless imagery and NDVI time series to address extensive spatial and temporal data gaps for forest classification. Next, by combining spectrally similar sample transfer method with existing land cover products, we constructed robust training samples spanning broad spatial and temporal scales. Subsequently, using a random forest classifier we generated annual 30 m forest cover maps for cloudy southern China, achieving an unprecedented balance between spatial and temporal resolution while improving mapping accuracy. The results demonstrate an overall accuracy of 0.904, surpassing that of the China Land Cover Dataset (CLCD, 0.889) and the China Annual Tree Cover Dataset (CATCD, 0.850). Particularly, our results revealed an overall upward trend in forest area—from 119.84 to 132.09 million hectares (Mha)—that was rarely captured in previous studies, closely aligning with National Forest Inventory (NFI) data (R<sup>2</sup> = 0.86). Finally, by integrating time-series analysis with classification results, this study transformed forest mapping from a traditional static framework to a dynamic temporal perspective, reducing uncertainties associated with direct interannual comparisons and estimating forest gains of 23.87 Mha and losses of 12.56 Mha. Notably, reconstructed data improved forest mapping in terms of completeness, resolution, and accuracy. In Guangxi, the annual product detected 11.24 Mha more forest gain than the 10-year composite, indicating better completeness. It also offered finer spatial resolution (30 m vs. 500 m) and higher overall accuracy (0.879 vs. 0.853), compared to the widely used cloud-affected annual product. Overall, this study presents a robust framework for precise forest monitoring in cloudy regions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 848-864"},"PeriodicalIF":12.2,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.isprsjprs.2025.12.013
Olaf Wysocki , Benedikt Schwab , Manoj Kumar Biswanath , Michael Greza , Qilin Zhang , Jingwei Zhu , Thomas Froech , Medhini Heeramaglore , Ihab Hijazi , Khaoula Kanna , Mathias Pechinger , Zhaiyu Chen , Yao Sun , Alejandro Rueda Segura , Ziyang Xu , Omar AbdelGafar , Mansour Mehranfar , Chandan Yeshwanth , Yueh-Cheng Liu , Hadi Yazdi , Boris Jutzi
Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models’ updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive Urban Digital Twin (UDT)s validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 and currently 767 GB of data. By ensuring georeferenced indoor–outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win.
{"title":"TUM2TWIN: Introducing the large-scale multimodal urban digital twin benchmark dataset","authors":"Olaf Wysocki , Benedikt Schwab , Manoj Kumar Biswanath , Michael Greza , Qilin Zhang , Jingwei Zhu , Thomas Froech , Medhini Heeramaglore , Ihab Hijazi , Khaoula Kanna , Mathias Pechinger , Zhaiyu Chen , Yao Sun , Alejandro Rueda Segura , Ziyang Xu , Omar AbdelGafar , Mansour Mehranfar , Chandan Yeshwanth , Yueh-Cheng Liu , Hadi Yazdi , Boris Jutzi","doi":"10.1016/j.isprsjprs.2025.12.013","DOIUrl":"10.1016/j.isprsjprs.2025.12.013","url":null,"abstract":"<div><div>Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models’ updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive Urban Digital Twin (UDT)s validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 <span><math><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> and currently 767 GB of data. By ensuring georeferenced indoor–outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: <span><span>https://tum2t.win</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 810-830"},"PeriodicalIF":12.2,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.isprsjprs.2026.01.001
Tong Xiao , Qunming Wang , Ping Lu , Tenghai Huang , Xiaohua Tong , Peter M. Atkinson
Accurate crowd detection (CD) is critical for public safety and historical pattern analysis, yet existing methods relying on ground and aerial imagery suffer from limited spatio-temporal coverage. The development of very-fine-resolution (VFR) satellite sensor imagery (e.g., ∼0.3 m spatial resolution) provides unprecedented opportunities for large-scale crowd activity analysis, but it has never been considered for this task. To address this gap, we proposed CrowdSat-Net, a novel point-based convolutional neural network, which features two innovative components: Dual-Context Progressive Attention Network (DCPAN) to improve feature representation of individuals by aggregating scene context and local individual characteristics, and High-Frequency Guided Deformable Upsampler (HFGDU) that recovers high-frequency information during upsampling through frequency-domain guided deformable convolutions. To validate the effectiveness of CrowdSat-Net, we developed CrowdSat, the first VFR satellite imagery dataset designed specifically for CD tasks, comprising over 120 k manually labeled individuals from multi-source satellite platforms (Beijing-3 N, Jilin-1 Gaofen-04A and Google Earth) across China. In the experiments, CrowdSat-Net was compared with eight state-of-the-art point-based CD methods (originally designed for ground or aerial imagery and satellite-based animal detection) using CrowdSat and achieved the largest F1-score of 66.12 % and Precision of 73.23 %, surpassing the second-best method by 0.80 % and 6.83 %, respectively. Moreover, extensive ablation experiments validated the importance of the DCPAN and HFGDU modules. Furthermore, cross-regional evaluation further demonstrated the spatial generalizability of CrowdSat-Net. This research advances CD capability by providing both a newly developed network architecture for CD and a pioneering benchmark dataset to facilitate future CD development. The source code is available at https://github.com/Tong-777777/CrowdSat-Net.
{"title":"Crowd detection using Very-Fine-Resolution satellite imagery","authors":"Tong Xiao , Qunming Wang , Ping Lu , Tenghai Huang , Xiaohua Tong , Peter M. Atkinson","doi":"10.1016/j.isprsjprs.2026.01.001","DOIUrl":"10.1016/j.isprsjprs.2026.01.001","url":null,"abstract":"<div><div>Accurate crowd detection (CD) is critical for public safety and historical pattern analysis, yet existing methods relying on ground and aerial imagery suffer from limited spatio-temporal coverage. The development of very-fine-resolution (VFR) satellite sensor imagery (e.g., ∼0.3 m spatial resolution) provides unprecedented opportunities for large-scale crowd activity analysis, but it has never been considered for this task. To address this gap, we proposed CrowdSat-Net, a novel point-based convolutional neural network, which features two innovative components: Dual-Context Progressive Attention Network (DCPAN) to improve feature representation of individuals by aggregating scene context and local individual characteristics, and High-Frequency Guided Deformable Upsampler (HFGDU) that recovers high-frequency information during upsampling through frequency-domain guided deformable convolutions. To validate the effectiveness of CrowdSat-Net, we developed CrowdSat, the first VFR satellite imagery dataset designed specifically for CD tasks, comprising over 120 k manually labeled individuals from multi-source satellite platforms (Beijing-3 N, Jilin-1 Gaofen-04A and Google Earth) across China. In the experiments, CrowdSat-Net was compared with eight state-of-the-art point-based CD methods (originally designed for ground or aerial imagery and satellite-based animal detection) using CrowdSat and achieved the largest F1-score of 66.12 % and Precision of 73.23 %, surpassing the second-best method by 0.80 % and 6.83 %, respectively. Moreover, extensive ablation experiments validated the importance of the DCPAN and HFGDU modules. Furthermore, cross-regional evaluation further demonstrated the spatial generalizability of CrowdSat-Net. This research advances CD capability by providing both a newly developed network architecture for CD and a pioneering benchmark dataset to facilitate future CD development. The source code is available at https://github.com/Tong-777777/CrowdSat-Net.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 787-809"},"PeriodicalIF":12.2,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.isprsjprs.2026.01.009
Shoujun Jia , Lotte de Vugt , Andreas Mayr , Katharina Anders , Chun Liu , Martin Rutzinger
Estimating complex 3D topographic surface changes including rigid spatial movement and non-rigid morphological deformation is an essential task to investigate Earth surface dynamics. However, for current 3D point comparison approaches, it is challenging to separate rigid and non-rigid topographic surface changes from multi-temporal 3D point clouds. Additionally, these methods are affected by challenges including topographic surface roughness and point cloud heterogeneities (i.e., discrete and irregular point distributions). To address these challenges, in this paper, we consider the dynamic evolution of topographic surfaces as the geometric changes of Riemann manifold surfaces. By building Euclidean (straight) and non-Euclidean (curved) coordinate systems on Riemann manifold surfaces that are represented from point clouds, the rigid transformation and non-rigid deformation of the Riemann manifold surfaces are solved to conceptualize rigid and non-rigid change tensors, respectively. On this basis, we design rigid (i.e., translation and rotation) and non-rigid (i.e., stretch and distortion) change features to describe various topographic surface changes and quantify the associated uncertainties to capture significant changes. The proposed method is tested on pairwise point clouds with simulated and real topographic surface changes in mountain regions. Simulation experiments demonstrate that the proposed method performed better than the baseline (i.e., M3C2) and state-of-the-art methods (i.e., LOG), with a higher translation accuracy (more than 50% improvement), a lower translation uncertainty (more than 61% reduction), and strong robustness to varying point densities. These results also show that the proposed method accurately quantifies three additional types of change features (i.e., the mean accuracies of rotation, stretch, and distortion are 1.5°, 0.5%, and 3.5°, respectively). Moreover, the real-scene experimental results demonstrate the effectiveness and superiority of the proposed method in estimating various topographic changes in real environments, the applicability in analyzing geomorphological processes, and the potential contribution for understanding spatiotemporal patterns of Earth surface dynamics.
{"title":"Change tensor: Estimating complex topographic changes from point clouds using Riemann manifold surfaces","authors":"Shoujun Jia , Lotte de Vugt , Andreas Mayr , Katharina Anders , Chun Liu , Martin Rutzinger","doi":"10.1016/j.isprsjprs.2026.01.009","DOIUrl":"10.1016/j.isprsjprs.2026.01.009","url":null,"abstract":"<div><div>Estimating complex 3D topographic surface changes including rigid spatial movement and non-rigid morphological deformation is an essential task to investigate Earth surface dynamics. However, for current 3D point comparison approaches, it is challenging to separate rigid and non-rigid topographic surface changes from multi-temporal 3D point clouds. Additionally, these methods are affected by challenges including topographic surface roughness and point cloud heterogeneities (i.e., discrete and irregular point distributions). To address these challenges, in this paper, we consider the dynamic evolution of topographic surfaces as the geometric changes of Riemann manifold surfaces. By building Euclidean (straight) and non-Euclidean (curved) coordinate systems on Riemann manifold surfaces that are represented from point clouds, the rigid transformation and non-rigid deformation of the Riemann manifold surfaces are solved to conceptualize rigid and non-rigid change tensors, respectively. On this basis, we design rigid (i.e., translation and rotation) and non-rigid (i.e., stretch and distortion) change features to describe various topographic surface changes and quantify the associated uncertainties to capture significant changes. The proposed method is tested on pairwise point clouds with simulated and real topographic surface changes in mountain regions. Simulation experiments demonstrate that the proposed method performed better than the baseline (i.e., M3C2) and state-of-the-art methods (i.e., LOG), with a higher translation accuracy (more than 50% improvement), a lower translation uncertainty (more than 61% reduction), and strong robustness to varying point densities. These results also show that the proposed method accurately quantifies three additional types of change features (i.e., the mean accuracies of rotation, stretch, and distortion are 1.5°, 0.5%, and 3.5°, respectively). Moreover, the real-scene experimental results demonstrate the effectiveness and superiority of the proposed method in estimating various topographic changes in real environments, the applicability in analyzing geomorphological processes, and the potential contribution for understanding spatiotemporal patterns of Earth surface dynamics.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 766-786"},"PeriodicalIF":12.2,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.isprsjprs.2026.01.011
Jie Yang , Huaiqing Zhang , Jinyang Li , Haoyue Yang , Tian Gao , Tingdong Yang , Jiaxin Wang , Xiaoli Zhang , Ting Yun , Yuxin Duanmu , Sihan Chen , Yukai Shi
Tree architecture analysis is fundamental to forestry, but complex trees challenge the accuracy and efficiency of point-cloud-based reconstruction. Here, we present SmartQSM, a novel quantitative structure model designed for reconstructing individual trees and extracting their parameters using ground-based laser scanning data. The method achieves point cloud contraction and forms the thin structures required for skeletonization by iteratively applying a sparse-convolution-based residual U-shaped network (ResUNet) to predict point movement towards the medial axis. This process is integrated with techniques from previous studies to form a complete reconstruction pipeline. Following the organization and QSM-based quantification of 47 individual-scale, 26 organ-scale, and 8 plot-scale parameters, the proposed method provides comprehensive support for extracting these metrics using the input point cloud and its outputs, including the skeleton and mesh. The performance was verified using the two-period leaf-off LiDAR data of a natural coniferous and broad-leaved mixed forest plot (in Qingyuan county, Liaoning province, China), and 2 open forest datasets. The existing major QSMs was used for comparison. The inference network adopted a three-stage hierarchical spatial compression architecture, initiating with 8 input channels and predicting with multi-layer perceptron. The reconstruction was insensitive to remaining leaves and the model did not have apparent distortion. The processing speed is efficient, about 12,000 points per second. In terms of major architectural parameters, the scores for trunk length, trunk volume, and bole height on the tested two period data of different tree species in the plot reached 0.97, 0.957, and 0.949, respectively, which were 0.043, 0.114, 0.029 higher than existing methods. the scores for branch length, branching angle, and tip deflection angle remained around 0.95. The overestimation of stem volume or aboveground biomass has been alleviated. The high reconstruction quality, efficiency, rich parameters, and unique visual interaction capabilities of the proposed method offer a novel and practical solution for forestry research and broader domains. The implementation code is currently available at: https://github.com/project-lightlin/SmartQSM.
{"title":"SmartQSM: a novel quantitative structure model using sparse-convolution-based point cloud contraction for reconstruction and analysis of individual tree architecture","authors":"Jie Yang , Huaiqing Zhang , Jinyang Li , Haoyue Yang , Tian Gao , Tingdong Yang , Jiaxin Wang , Xiaoli Zhang , Ting Yun , Yuxin Duanmu , Sihan Chen , Yukai Shi","doi":"10.1016/j.isprsjprs.2026.01.011","DOIUrl":"10.1016/j.isprsjprs.2026.01.011","url":null,"abstract":"<div><div>Tree architecture analysis is fundamental to forestry, but complex trees challenge the accuracy and efficiency of point-cloud-based reconstruction. Here, we present SmartQSM, a novel quantitative structure model designed for reconstructing individual trees and extracting their parameters using ground-based laser scanning data. The method achieves point cloud contraction and forms the thin structures required for skeletonization by iteratively applying a sparse-convolution-based residual U-shaped network (ResUNet) to predict point movement towards the medial axis. This process is integrated with techniques from previous studies to form a complete reconstruction pipeline. Following the organization and QSM-based quantification of 47 individual-scale, 26 organ-scale, and 8 plot-scale parameters, the proposed method provides comprehensive support for extracting these metrics using the input point cloud and its outputs, including the skeleton and mesh. The performance was verified using the two-period leaf-off LiDAR data of a natural coniferous and broad-leaved mixed forest plot (in Qingyuan county, Liaoning province, China), and 2 open forest datasets. The existing major QSMs was used for comparison. The inference network adopted a three-stage hierarchical spatial compression architecture, initiating with 8 input channels and predicting with multi-layer perceptron. The reconstruction was insensitive to remaining leaves and the model did not have apparent distortion. The processing speed is efficient, about 12,000 points per second. In terms of major architectural parameters, the <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> scores for trunk length, trunk volume, and bole height on the tested two period data of different tree species in the plot reached 0.97, 0.957, and 0.949, respectively, which were 0.043, 0.114, 0.029 higher than existing methods. the <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> scores for branch length, branching angle, and tip deflection angle remained around 0.95. The overestimation of stem volume or aboveground biomass has been alleviated. The high reconstruction quality, efficiency, rich parameters, and unique visual interaction capabilities of the proposed method offer a novel and practical solution for forestry research and broader domains. The implementation code is currently available at: <span><span>https://github.com/project-lightlin/SmartQSM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 712-739"},"PeriodicalIF":12.2,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.isprsjprs.2025.12.016
Pengcheng Hu , Xu Li , Junhuan Peng , Xu Ma , Yuhan Su , Xiaoman Qi , Xinwei Jiang , Wenwen Wang
Interferometric Synthetic Aperture Radar (InSAR) is a technology that can effectively obtain ground information, conduct large-scale topography mapping, and monitor surface deformation. However, InSAR data is interfered by speckle noise caused by radar echo signal fading, ground background clutter, and decoherence, which affects the InSAR interferometric phase quality and thus reduces the accuracy of InSAR results. The existing Complex Convolutional Sparse Coding Gradient Regularization (ComCSC-GR) method incorporates gradient regularization by considering the sparse coefficient matrix’s gradients in both row (azimuth) and column (range) directions. It is an advanced and effective interferogram phase filtering method that can improve the interferogram quality. However, this method does not take into account the variation characteristics of the diagonal gradient and the second-order difference information (caused by edge mutations). As a result, the interferogram still exhibits problems such as staircase artifacts in high-noise and low-coherence areas, uneven interferograms (caused by a large number of residual points), and unclear phase edge structure. This article introduces multiple directional gradients and second-order differential Laplacian operator information, and construct two models: “Complex Convolutional Sparse Coding Model with L2-norm Regularization of Directional Gradients and Laplacian Operator (ComCSC-RCDL) ” and “Complex Convolutional Sparse Coding Model Coupled with L1-norm Total Variation Regularization (ComCSC-RCDL-TV)”. These methods enhance the fidelity of phase texture and edge structure, and improve the quality of InSAR interferogram filtering phase in low-coherence scenarios. Comparative experiments were conducted using simulated data, real data from Sentinel-1 and LuTan-1 (LT-1), and advanced methods including ComCSC-GR and InSAR-BM3D (real data experiments included comparison experiments before and after removing the interferogram orbit error). The results show that the proposed model method performs better than the comparative model, verifying the effectiveness of the proposed model.
{"title":"Complex convolutional sparse coding InSAR phase filtering Incorporating directional gradients and second-order difference regularization","authors":"Pengcheng Hu , Xu Li , Junhuan Peng , Xu Ma , Yuhan Su , Xiaoman Qi , Xinwei Jiang , Wenwen Wang","doi":"10.1016/j.isprsjprs.2025.12.016","DOIUrl":"10.1016/j.isprsjprs.2025.12.016","url":null,"abstract":"<div><div>Interferometric Synthetic Aperture Radar (InSAR) is a technology that can effectively obtain ground information, conduct large-scale topography mapping, and monitor surface deformation. However, InSAR data is interfered by speckle noise caused by radar echo signal fading, ground background clutter, and decoherence, which affects the InSAR interferometric phase quality and thus reduces the accuracy of InSAR results. The existing Complex Convolutional Sparse Coding Gradient Regularization (ComCSC-GR) method incorporates gradient regularization by considering the sparse coefficient matrix’s gradients in both row (azimuth) and column (range) directions. It is an advanced and effective interferogram phase filtering method that can improve the interferogram quality. However, this method does not take into account the variation characteristics of the diagonal gradient and the second-order difference information (caused by edge mutations). As a result, the interferogram still exhibits problems such as staircase artifacts in high-noise and low-coherence areas, uneven interferograms (caused by a large number of residual points), and unclear phase edge structure. This article introduces multiple directional gradients and second-order differential Laplacian operator information, and construct two models: “Complex Convolutional Sparse Coding Model with <em>L</em><sub>2</sub>-norm Regularization of Directional Gradients and Laplacian Operator (ComCSC-RCDL) ” and “Complex Convolutional Sparse Coding Model Coupled with <em>L</em><sub>1</sub>-norm Total Variation Regularization (ComCSC-RCDL-TV)”. These methods enhance the fidelity of phase texture and edge structure, and improve the quality of InSAR interferogram filtering phase in low-coherence scenarios. Comparative experiments were conducted using simulated data, real data from Sentinel-1 and LuTan-1 (LT-1), and advanced methods including ComCSC-GR and InSAR-BM3D (real data experiments included comparison experiments before and after removing the interferogram orbit error). The results show that the proposed model method performs better than the comparative model, verifying the effectiveness of the proposed model.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 740-765"},"PeriodicalIF":12.2,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.isprsjprs.2025.12.018
Changpeng Li , Bangyi Tao , Yunzhou Li , Yan Wang , Yixian Zhu , Renjie Chen , Haiqing Huang , Delu Pan , Hongtao Wang
Spatial–spectral fusion offers a viable solution for the quantitative inversion of water parameters using multispectral resolution images (MSRIs) and limited bands of high spatial resolution images (HSRIs). Most existing fusion methods assume that the ground coverage type at the same location or the spatial patterns of images do not change over time. However, these fundamental assumptions are not valid under highly dynamic ocean conditions caused by various currents and tides. In this study, we propose a new assumption: the types of optical water bodies remain consistent within a certain time frame and a specific spatial region, and the spectral characteristics of each optical water type remain stable. Subsequently, a new spatial–spectral fusion method, referred to as the optical water classification based data fusion (OWCDF), was developed to realize accurate spatial–spectral fusion in oceanic environments. The OWCDF algorithm comprises three key steps: 1) re-calibration based on a maximum cosine correlation (MCC) match, 2) spectral regression based on optical water classification, and 3) residual compensation. A well-designed scheme was developed to evaluate the performance of OWCDF against existing algorithms by fusing a CZI/HY-1C/D image (HSRI) with time-series GOCI-II images (MSRI), with temporal differences increasing from 1 h to 4 h. The OWCDF algorithm exhibited a substantially better ability to resist the influence of highly dynamic changes in water bodies than other algorithms. Further tests applying OWCDF to the data of OLCI/S3 and CZI/HY-1C/D or CCD/HJ-2A/B confirmed its applicability to polar-orbiting satellites, achieving quantitative observation with a high spatial resolution even 2–3 times a day. In the future, the accuracy of the optical water type classification must be improved, and limitations under poor observation conditions, such as broken clouds and sun glints, should be further considered.
{"title":"Spatial–spectral fusion under highly dynamic ocean conditions based on optical water classification","authors":"Changpeng Li , Bangyi Tao , Yunzhou Li , Yan Wang , Yixian Zhu , Renjie Chen , Haiqing Huang , Delu Pan , Hongtao Wang","doi":"10.1016/j.isprsjprs.2025.12.018","DOIUrl":"10.1016/j.isprsjprs.2025.12.018","url":null,"abstract":"<div><div>Spatial–spectral fusion offers a viable solution for the quantitative inversion of water parameters using multispectral resolution images (MSRIs) and limited bands of high spatial resolution images (HSRIs). Most existing fusion methods assume that the ground coverage type at the same location or the spatial patterns of images do not change over time. However, these fundamental assumptions are not valid under highly dynamic ocean conditions caused by various currents and tides. In this study, we propose a new assumption: the types of optical water bodies remain consistent within a certain time frame and a specific spatial region, and the spectral characteristics of each optical water type remain stable. Subsequently, a new spatial–spectral fusion method, referred to as the optical water classification based data fusion (OWCDF), was developed to realize accurate spatial–spectral fusion in oceanic environments. The OWCDF algorithm comprises three key steps: 1) re-calibration based on a maximum cosine correlation (MCC) match, 2) spectral regression based on optical water classification, and 3) residual compensation. A well-designed scheme was developed to evaluate the performance of OWCDF against existing algorithms by fusing a CZI/HY-1C/D image (HSRI) with time-series GOCI-II images (MSRI), with temporal differences increasing from 1 h to 4 h. The OWCDF algorithm exhibited a substantially better ability to resist the influence of highly dynamic changes in water bodies than other algorithms. Further tests applying OWCDF to the data of OLCI/S3 and CZI/HY-1C/D or CCD/HJ-2A/B confirmed its applicability to polar-orbiting satellites, achieving quantitative observation with a high spatial resolution even 2–3 times a day. In the future, the accuracy of the optical water type classification must be improved, and limitations under poor observation conditions, such as broken clouds and sun glints, should be further considered.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 689-711"},"PeriodicalIF":12.2,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}