首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
Advancing table beet root yield estimation via unmanned aerial systems (UAS) multi-modal sensing 基于无人机系统(UAS)多模态遥感的甜菜根产量估算研究进展
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2026-01-06 DOI: 10.1016/j.isprsjprs.2025.12.026
Mohammad S. Saif , Robert Chancia , Sean P. Murphy , Sarah Pethybridge , Jan van Aardt
Unmanned aerial systems (UAS) offer significant potential to improve agricultural practices due to their multi-modal payload capacity, ease of deployment, and lower cost. However, there is a need to expand UAS capabilities by including root crops, offering robust, growth-stage-independent models, and providing a comprehensive assessment of various imaging systems, i.e., identifying application-specific sensing modalities. This study aims to tackle those challenges by presenting a unified Gaussian Process Regression (GPR) model for predicting end-of-season table beet (a subterranean root crop) yield using UAS-derived spectral and structural features, combined with meteorological data, while remaining robust to flight and harvest timing. Field trials were conducted at Cornell AgriTech in Geneva, NY during the 2021 and 2022 growing seasons. UAS flights captured five-band (475, 560, 668, 717, and 840 nm) multispectral imagery, hyperspectral imagery (400–1000 nm), and light detection and ranging (LiDAR) data at multiple times throughout the season. Our model achieved R2test = 0.81 and MAPEtest = 15.7 % using only multispectral imagery, while the hyperspectral + LiDAR model attained R2test = 0.79 and MAPEtest = 17.4 %, which is comparable to recent root yield modeling studies using UAS data. Shapely analysis was performed to gain further insight into model behavior. This analysis revealed canopy volume information to contain high relative importance, as compared to other features, for table beet root yield estimation. Our study demonstrated that UAS-based imaging, combined with a unified machine learning model, can effectively predict root crop yield, providing a scalable and transferable approach for precision agriculture.
无人机系统(UAS)由于其多模式有效载荷能力、易于部署和较低的成本,为改善农业实践提供了巨大的潜力。然而,有必要通过包括块根作物,提供强大的,生长阶段独立的模型,并提供各种成像系统的综合评估,即确定特定应用的传感模式,来扩展UAS的能力。本研究旨在通过提出统一的高斯过程回归(GPR)模型来解决这些挑战,该模型利用无人机系统衍生的光谱和结构特征,结合气象数据,预测季末食用甜菜(一种地下根茎作物)的产量,同时保持对飞行和收获时间的鲁棒性。在2021年和2022年的生长季节,在纽约日内瓦的康奈尔农业技术公司进行了现场试验。UAS飞行在整个季节多次捕获五波段(475、560、668、717和840 nm)多光谱图像、高光谱图像(400-1000 nm)以及光探测和测距(LiDAR)数据。我们的模型仅使用多光谱图像时的R2test = 0.81, MAPEtest = 15.7%,而高光谱+ LiDAR模型的R2test = 0.79, MAPEtest = 17.4%,这与最近使用UAS数据进行的根系产量建模研究相当。进行形状分析以进一步了解模型行为。该分析表明,与其他特征相比,冠层体积信息对甜菜根产量估算具有较高的相对重要性。我们的研究表明,基于无人机的成像与统一的机器学习模型相结合,可以有效地预测根茎作物产量,为精准农业提供了一种可扩展和可转移的方法。
{"title":"Advancing table beet root yield estimation via unmanned aerial systems (UAS) multi-modal sensing","authors":"Mohammad S. Saif ,&nbsp;Robert Chancia ,&nbsp;Sean P. Murphy ,&nbsp;Sarah Pethybridge ,&nbsp;Jan van Aardt","doi":"10.1016/j.isprsjprs.2025.12.026","DOIUrl":"10.1016/j.isprsjprs.2025.12.026","url":null,"abstract":"<div><div>Unmanned aerial systems (UAS) offer significant potential to improve agricultural practices due to their multi-modal payload capacity, ease of deployment, and lower cost. However, there is a need to expand UAS capabilities by including root crops, offering robust, growth-stage-independent models, and providing a comprehensive assessment of various imaging systems, i.e., identifying application-specific sensing modalities. This study aims to tackle those challenges by presenting a unified Gaussian Process Regression (GPR) model for predicting end-of-season table beet (a subterranean root crop) yield using UAS-derived spectral and structural features, combined with meteorological data, while remaining robust to flight and harvest timing. Field trials were conducted at Cornell AgriTech in Geneva, NY during the 2021 and 2022 growing seasons. UAS flights captured five-band (475, 560, 668, 717, and 840 nm) multispectral imagery, hyperspectral imagery (400–1000 nm), and light detection and ranging (LiDAR) data at multiple times throughout the season. Our model achieved <em>R<sup>2</sup><sub>test</sub></em> = 0.81 and MAPE<sub>test</sub> = 15.7 % using only multispectral imagery, while the hyperspectral + LiDAR model attained <em>R<sup>2</sup><sub>test</sub></em> = 0.79 and MAPE<sub>test</sub> = 17.4 %, which is comparable to recent root yield modeling studies using UAS data. Shapely analysis was performed to gain further insight into model behavior. This analysis revealed canopy volume information to contain high relative importance, as compared to other features, for table beet root yield estimation. Our study demonstrated that UAS-based imaging, combined with a unified machine learning model, can effectively predict root crop yield, providing a scalable and transferable approach for precision agriculture.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 542-560"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Roof-aware indoor BIM reconstruction from LiDAR via graph-attention for residential buildings 基于图形关注的住宅建筑激光雷达屋顶感知室内BIM重建
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-29 DOI: 10.1016/j.isprsjprs.2025.12.024
Biao Xiong , Bohan Wang , Yiyi Liu , Liangliang Wang , Yanchao Yang , Liang Zhou , Qiegen Liu
Building Information Models (BIMs) provide structured, parametric representations that are fundamental for simulation, facility management, and digital twin applications. However, reconstructing BIMs from terrestrial LiDAR scans remains challenging due to clutter, occlusions, and the geometric complexity of roof structures. This paper presents a roof-aware scan-to-BIM pipeline tailored for residential buildings, which processes indoor LiDAR data through four geometric abstractions, raw points, superpoints, triangle meshes, and volumetric polyhedra, each represented by task-specific graphs. The pipeline integrates three modules: LGNet for semantic segmentation, QTNet for floor plan reconstruction, and PPO for roof–floor fusion. It demonstrates strong cross-dataset generalization, being trained on Structured3D and fine-tuned on the real-world WHUTS dataset. The method produces watertight, Revit-compatible BIMs with an average surface deviation of 9 mm RMS on WHUTS scenes featuring slanted roofs. Compared with state-of-the-art scan-to-BIM and floor plan reconstruction methods, the proposed approach achieves higher geometric accuracy on scenes with slanted roofs, reducing surface reconstruction error by over 12–18% and improving layout reconstruction F1-scores by up to 6–8%. The proposed framework provides a robust, accurate, and fully automated solution for roof-aware BIM reconstruction of residential buildings from terrestrial LiDAR data, offering comprehensive support for slanted roof modeling. The source code and datasets are publicly available at https://github.com/Wangbohan-x/roof-aware-scan2bim.git.
建筑信息模型(bim)为仿真、设施管理和数字孪生应用提供结构化、参数化的表示。然而,由于杂波、遮挡和屋顶结构的几何复杂性,从地面激光雷达扫描重建bim仍然具有挑战性。本文介绍了为住宅建筑量身定制的屋顶感知扫描到bim的管道,该管道通过四个几何抽象处理室内激光雷达数据,原始点、叠加点、三角形网格和体积多面体,每个都由特定任务的图形表示。该管道集成了三个模块:用于语义分割的LGNet、用于平面图重建的QTNet和用于屋顶-地板融合的PPO。它展示了强大的跨数据集泛化,在Structured3D上进行了训练,并在现实世界的WHUTS数据集上进行了微调。该方法可以在WHUTS倾斜屋顶场景中产生水密的、revit兼容的bim,平均表面偏差RMS为9毫米。与目前最先进的扫描到bim和平面图重建方法相比,该方法在倾斜屋顶场景中具有更高的几何精度,将表面重建误差降低了12-18%以上,将布局重建f1分数提高了6-8%。提出的框架为基于地面激光雷达数据的住宅建筑的屋顶感知BIM重建提供了一个强大、准确和全自动的解决方案,为倾斜屋顶建模提供了全面的支持。源代码和数据集可在https://github.com/Wangbohan-x/roof-aware-scan2bim.git上公开获得。
{"title":"Roof-aware indoor BIM reconstruction from LiDAR via graph-attention for residential buildings","authors":"Biao Xiong ,&nbsp;Bohan Wang ,&nbsp;Yiyi Liu ,&nbsp;Liangliang Wang ,&nbsp;Yanchao Yang ,&nbsp;Liang Zhou ,&nbsp;Qiegen Liu","doi":"10.1016/j.isprsjprs.2025.12.024","DOIUrl":"10.1016/j.isprsjprs.2025.12.024","url":null,"abstract":"<div><div>Building Information Models (BIMs) provide structured, parametric representations that are fundamental for simulation, facility management, and digital twin applications. However, reconstructing BIMs from terrestrial LiDAR scans remains challenging due to clutter, occlusions, and the geometric complexity of roof structures. This paper presents a <strong>roof-aware scan-to-BIM pipeline</strong> tailored for residential buildings, which processes indoor LiDAR data through four geometric abstractions, raw points, superpoints, triangle meshes, and volumetric polyhedra, each represented by task-specific graphs. The pipeline integrates three modules: <strong>LGNet</strong> for semantic segmentation, <strong>QTNet</strong> for floor plan reconstruction, and <strong>PPO</strong> for roof–floor fusion. It demonstrates strong cross-dataset generalization, being trained on Structured3D and fine-tuned on the real-world WHUTS dataset. The method produces watertight, Revit-compatible BIMs with an average surface deviation of <strong>9 mm RMS</strong> on WHUTS scenes featuring slanted roofs. Compared with state-of-the-art scan-to-BIM and floor plan reconstruction methods, the proposed approach achieves higher geometric accuracy on scenes with slanted roofs, reducing surface reconstruction error by over <strong>12–18%</strong> and improving layout reconstruction F1-scores by up to <strong>6–8%</strong>. The proposed framework provides a robust, accurate, and fully automated solution for roof-aware BIM reconstruction of residential buildings from terrestrial LiDAR data, offering comprehensive support for slanted roof modeling. The source code and datasets are publicly available at <span><span>https://github.com/Wangbohan-x/roof-aware-scan2bim.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 408-420"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spatially Masked Adaptive Gated Network for multimodal post-flood water extent mapping using SAR and incomplete multispectral data 基于SAR和不完全多光谱数据的多模态洪水后水位映射空间掩膜自适应门控网络
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2026-01-05 DOI: 10.1016/j.isprsjprs.2025.12.023
Hyunho Lee, Wenwen Li
Mapping water extent during a flood event is essential for effective disaster management throughout all phases: mitigation, preparedness, response, and recovery. In particular, during the response stage, when timely and accurate information is important, Synthetic Aperture Radar (SAR) data are primarily employed to produce water extent maps. This is because SAR sensors can observe through cloud cover and operate both day and night, whereas Multispectral Imaging (MSI) data, despite providing higher mapping accuracy, are only available under cloud-free and daytime conditions. Recently, leveraging the complementary characteristics of SAR and MSI data through a multimodal approach has emerged as a promising strategy for advancing water extent mapping using deep learning models. This approach is particularly beneficial when timely post-flood observations, acquired during or shortly after the flood peak, are limited, as it enables the use of all available imagery for more accurate post-flood water extent mapping. However, the adaptive integration of partially available MSI data into the SAR-based post-flood water extent mapping process remains underexplored. To bridge this research gap, we propose the Spatially Masked Adaptive Gated Network (SMAGNet), a multimodal deep learning model that utilizes SAR data as the primary input for post-flood water extent mapping and integrates complementary MSI data through feature fusion. In experiments on the C2S-MS Floods dataset, SMAGNet consistently outperformed other multimodal deep learning models in prediction performance across varying levels of MSI data availability. Specifically, SMAGNet achieved the highest IoU score of 86.47% using SAR and MSI data and maintained the highest performance with an IoU score of 79.53% even when MSI data were entirely missing. Furthermore, we found that even when MSI data were completely missing, the performance of SMAGNet remained statistically comparable to that of a U-Net model trained solely on SAR data. These findings indicate that SMAGNet enhances the model robustness to missing data as well as the applicability of multimodal deep learning in real-world flood management scenarios. The source code is available at https://github.com/ASUcicilab/SMAGNet.
绘制洪水期间的水位图对于在减灾、备灾、救灾和恢复等各个阶段进行有效的灾害管理至关重要。特别是在响应阶段,需要及时准确的信息,合成孔径雷达(Synthetic Aperture Radar, SAR)数据主要用于生成水域图。这是因为SAR传感器可以透过云层进行观测,并且可以在白天和夜间操作,而多光谱成像(MSI)数据尽管提供了更高的制图精度,但只能在无云和白天的条件下使用。最近,通过多模态方法利用SAR和MSI数据的互补特征已成为利用深度学习模型推进水域制图的一种有前途的策略。当在洪峰期间或洪峰后不久获得的及时的洪水后观测数据有限时,这种方法特别有用,因为它可以使用所有可用的图像来更准确地绘制洪水后的水范围图。然而,将部分可用的MSI数据自适应集成到基于sar的洪水后水位制图过程中仍未得到充分探索。为了弥补这一研究空白,我们提出了空间掩膜自适应门控网络(SMAGNet),这是一种多模态深度学习模型,利用SAR数据作为洪水后水范围映射的主要输入,并通过特征融合整合互补的MSI数据。在C2S-MS洪水数据集的实验中,SMAGNet在不同级别的MSI数据可用性的预测性能方面始终优于其他多模态深度学习模型。具体而言,SMAGNet在使用SAR和MSI数据时获得了最高的IoU得分86.47%,即使在MSI数据完全缺失的情况下也保持了最高的IoU得分79.53%。此外,我们发现,即使在MSI数据完全缺失的情况下,SMAGNet的性能在统计上仍与仅使用SAR数据训练的U-Net模型相当。这些发现表明SMAGNet增强了模型对缺失数据的鲁棒性,以及多模态深度学习在现实世界洪水管理场景中的适用性。源代码可从https://github.com/ASUcicilab/SMAGNet获得。
{"title":"A Spatially Masked Adaptive Gated Network for multimodal post-flood water extent mapping using SAR and incomplete multispectral data","authors":"Hyunho Lee,&nbsp;Wenwen Li","doi":"10.1016/j.isprsjprs.2025.12.023","DOIUrl":"10.1016/j.isprsjprs.2025.12.023","url":null,"abstract":"<div><div>Mapping water extent during a flood event is essential for effective disaster management throughout all phases: mitigation, preparedness, response, and recovery. In particular, during the response stage, when timely and accurate information is important, Synthetic Aperture Radar (SAR) data are primarily employed to produce water extent maps. This is because SAR sensors can observe through cloud cover and operate both day and night, whereas Multispectral Imaging (MSI) data, despite providing higher mapping accuracy, are only available under cloud-free and daytime conditions. Recently, leveraging the complementary characteristics of SAR and MSI data through a multimodal approach has emerged as a promising strategy for advancing water extent mapping using deep learning models. This approach is particularly beneficial when timely post-flood observations, acquired during or shortly after the flood peak, are limited, as it enables the use of all available imagery for more accurate post-flood water extent mapping. However, the adaptive integration of partially available MSI data into the SAR-based post-flood water extent mapping process remains underexplored. To bridge this research gap, we propose the Spatially Masked Adaptive Gated Network (SMAGNet), a multimodal deep learning model that utilizes SAR data as the primary input for post-flood water extent mapping and integrates complementary MSI data through feature fusion. In experiments on the C2S-MS Floods dataset, SMAGNet consistently outperformed other multimodal deep learning models in prediction performance across varying levels of MSI data availability. Specifically, SMAGNet achieved the highest IoU score of 86.47% using SAR and MSI data and maintained the highest performance with an IoU score of 79.53% even when MSI data were entirely missing. Furthermore, we found that even when MSI data were completely missing, the performance of SMAGNet remained statistically comparable to that of a U-Net model trained solely on SAR data. These findings indicate that SMAGNet enhances the model robustness to missing data as well as the applicability of multimodal deep learning in real-world flood management scenarios. The source code is available at <span><span>https://github.com/ASUcicilab/SMAGNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 492-508"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond synthetic scenarios: Weakly-supervised super-resolution for spatiotemporally misaligned remote sensing images 超越合成场景:时空失调遥感图像的弱监督超分辨率
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2026-01-06 DOI: 10.1016/j.isprsjprs.2025.12.019
Quanyi Guo , Rui Liu , Yangtian Fang , Yi Gao , Jun Chen , Xin Tian
Deep learning-based remote sensing image super-resolution is crucial for enhancing the spatial resolution of Earth observation data. Due to the absence of perfectly aligned pairs of high- and low-resolution remote sensing images, most existing supervised and self-supervised approaches rely on synthetic degradation models or internal structural consistency to generate training data. Consequently, these methods suffer from the domain gap between synthetic and real datasets, which limits their ability to model realistic degradation and degrades their performance in real scenes. To overcome this challenge, we propose STANet, a weakly-supervised super-resolution method for spatiotemporally misaligned remote sensing images. In particular, STANet directly utilizes images of the same region captured by multiple satellites at different resolutions as datasets, to boost the real remote sensing image super-resolution performance. However, this approach also introduces new challenges related to spatiotemporal misalignment. To address this, we design a spatiotemporal align module that includes a Scale Align Module (SAM) and a Temporal Align Module (TAM). SAM uses affine transformations to align spatial features at both the pixel and global levels, while TAM applies window-based attention to adjust the weight of image content, mitigating the misleading effects of temporal misalignment on results. Besides, we also design a style encoder based on contrastive learning and a structure encoder based on variational inference, which guide SAM and TAM for feature alignment and enhance adaptability. Finally, the feature-aligned output, after upsampling, are fused with the high-frequency-enhancing output of the texture transfer module through the weighted fusion module to generate the super-resolution image. Extensive experiments on synthetic datasets based on AID and RSSR25, real datasets captured by GaoFen satellites, and cross-satellite experiments on Landsat-8 datasets demonstrate STANet’s superiority over other state-of-the-art methods.
基于深度学习的遥感图像超分辨率是提高对地观测数据空间分辨率的关键。由于缺乏高分辨率和低分辨率遥感图像的完美对齐对,现有的大多数监督和自监督方法依赖于合成退化模型或内部结构一致性来生成训练数据。因此,这些方法受到合成数据集和真实数据集之间的域差距的影响,这限制了它们模拟真实退化的能力,并降低了它们在真实场景中的性能。为了克服这一挑战,我们提出了STANet,一种用于时空失调遥感图像的弱监督超分辨率方法。特别是,STANet直接利用多颗卫星在不同分辨率下捕获的同一区域的图像作为数据集,以提高真实遥感图像的超分辨率性能。然而,这种方法也带来了与时空失调相关的新挑战。为了解决这个问题,我们设计了一个时空对齐模块,其中包括一个尺度对齐模块(SAM)和一个时间对齐模块(TAM)。SAM使用仿射变换在像素和全局级别对齐空间特征,而TAM使用基于窗口的注意力来调整图像内容的权重,从而减轻时间偏差对结果的误导影响。此外,我们还设计了基于对比学习的风格编码器和基于变分推理的结构编码器,引导SAM和TAM进行特征对齐,增强了自适应能力。最后,特征对齐输出经过上采样后,通过加权融合模块与纹理传递模块的高频增强输出进行融合,生成超分辨率图像。在基于AID和RSSR25的合成数据集、高分卫星捕获的真实数据集以及Landsat-8数据集的跨卫星实验上进行的大量实验表明,STANet优于其他最先进的方法。
{"title":"Beyond synthetic scenarios: Weakly-supervised super-resolution for spatiotemporally misaligned remote sensing images","authors":"Quanyi Guo ,&nbsp;Rui Liu ,&nbsp;Yangtian Fang ,&nbsp;Yi Gao ,&nbsp;Jun Chen ,&nbsp;Xin Tian","doi":"10.1016/j.isprsjprs.2025.12.019","DOIUrl":"10.1016/j.isprsjprs.2025.12.019","url":null,"abstract":"<div><div>Deep learning-based remote sensing image super-resolution is crucial for enhancing the spatial resolution of Earth observation data. Due to the absence of perfectly aligned pairs of high- and low-resolution remote sensing images, most existing supervised and self-supervised approaches rely on synthetic degradation models or internal structural consistency to generate training data. Consequently, these methods suffer from the domain gap between synthetic and real datasets, which limits their ability to model realistic degradation and degrades their performance in real scenes. To overcome this challenge, we propose STANet, a weakly-supervised super-resolution method for spatiotemporally misaligned remote sensing images. In particular, STANet directly utilizes images of the same region captured by multiple satellites at different resolutions as datasets, to boost the real remote sensing image super-resolution performance. However, this approach also introduces new challenges related to spatiotemporal misalignment. To address this, we design a spatiotemporal align module that includes a Scale Align Module (SAM) and a Temporal Align Module (TAM). SAM uses affine transformations to align spatial features at both the pixel and global levels, while TAM applies window-based attention to adjust the weight of image content, mitigating the misleading effects of temporal misalignment on results. Besides, we also design a style encoder based on contrastive learning and a structure encoder based on variational inference, which guide SAM and TAM for feature alignment and enhance adaptability. Finally, the feature-aligned output, after upsampling, are fused with the high-frequency-enhancing output of the texture transfer module through the weighted fusion module to generate the super-resolution image. Extensive experiments on synthetic datasets based on AID and RSSR25, real datasets captured by GaoFen satellites, and cross-satellite experiments on Landsat-8 datasets demonstrate STANet’s superiority over other state-of-the-art methods.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 524-541"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145902921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping oil spills under varying sun glint conditions using a diffusion model with spatial-spectral-frequency constraints 利用具有空间-频谱-频率约束的扩散模型绘制不同太阳闪烁条件下的溢油图
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-02 DOI: 10.1016/j.isprsjprs.2025.11.032
Kai Du , Yi Ma , Zhongwei Li , Rongjie Liu , Zongchen Jiang , Junfang Yang
Marine oil spill detection via optical remote sensing is critically challenged by complex optical signatures induced by sun glint, which causes contrast reversals and masks the spectral features of different oil emulsion types. This study introduces a novel deep learning framework, the Spatial-Spectral Attention Conditioned Diffusion Probabilistic Model with Dual-branch Frequency Parser (OSS-Diff). The core architecture integrates two specialized modules. The first, a physics-informed Dual-branch Frequency Parser (DF-P) module, is designed to disentangle contrast variations by separately processing high-frequency edge features (characteristic of bright spills in strong glint) and low-frequency background information (typical of dark spills in weak glint). Concurrently, a systematically designed Spatial-Spectral Attention (SS-A) module targets spectral ambiguity by adaptively amplifying the discriminative features crucial for identifying oil emulsification states. Extensive experiments confirm the model’s superior performance. Under weak sun glint, it achieved F1-scores of 0.907 for non-emulsified and 0.862 for emulsified slicks. For positive-contrast spills under strong sun glint, it achieved an F1-score of 0.918. Ablation studies validate the synergistic effect of the proposed components, with the full model achieving a 4.9% F1-score gain over the baseline. Significantly, this work also reveals a potential link between the contrast inversion angle and the oil’s refractive index, suggesting a novel avenue for remotely characterizing the physical properties of oil spills. This research provides a robust framework for automated oil spill monitoring, offering a reliable model for environmental protection and emergency response.
由于太阳闪烁引起的复杂光学特征导致对比度逆转,并掩盖了不同油乳化液类型的光谱特征,因此通过光学遥感进行海洋溢油检测面临严峻挑战。本研究引入了一种新的深度学习框架——双分支频率解析器的空间-频谱注意条件扩散概率模型(ss - diff)。核心架构集成了两个专门的模块。第一个是物理信息的双分支频率解析器(DF-P)模块,通过分别处理高频边缘特征(强闪烁中明亮溢出的特征)和低频背景信息(弱闪烁中典型的暗溢出)来分离对比度变化。同时,系统设计的空间光谱注意(SS-A)模块通过自适应放大识别油乳化状态的关键判别特征来解决光谱模糊问题。大量的实验验证了该模型的优越性能。在弱光照条件下,未乳化油的f1得分为0.907,乳化油的f1得分为0.862。对于强阳光照射下的正对比溢出,其f1得分为0.918。消融研究验证了所提出的组件的协同效应,与基线相比,完整模型的f1评分提高了4.9%。值得注意的是,这项工作还揭示了对比反演角与油的折射率之间的潜在联系,为远程表征溢油的物理性质提供了一种新的途径。本研究为溢油自动化监测提供了一个健全的框架,为环境保护和应急响应提供了可靠的模型。
{"title":"Mapping oil spills under varying sun glint conditions using a diffusion model with spatial-spectral-frequency constraints","authors":"Kai Du ,&nbsp;Yi Ma ,&nbsp;Zhongwei Li ,&nbsp;Rongjie Liu ,&nbsp;Zongchen Jiang ,&nbsp;Junfang Yang","doi":"10.1016/j.isprsjprs.2025.11.032","DOIUrl":"10.1016/j.isprsjprs.2025.11.032","url":null,"abstract":"<div><div>Marine oil spill detection via optical remote sensing is critically challenged by complex optical signatures induced by sun glint, which causes contrast reversals and masks the spectral features of different oil emulsion types. This study introduces a novel deep learning framework, the Spatial-Spectral Attention Conditioned Diffusion Probabilistic Model with Dual-branch Frequency Parser (OSS-Diff). The core architecture integrates two specialized modules. The first, a physics-informed Dual-branch Frequency Parser (DF-P) module, is designed to disentangle contrast variations by separately processing high-frequency edge features (characteristic of bright spills in strong glint) and low-frequency background information (typical of dark spills in weak glint). Concurrently, a systematically designed Spatial-Spectral Attention (SS-A) module targets spectral ambiguity by adaptively amplifying the discriminative features crucial for identifying oil emulsification states. Extensive experiments confirm the model’s superior performance. Under weak sun glint, it achieved F1-scores of 0.907 for non-emulsified and 0.862 for emulsified slicks. For positive-contrast spills under strong sun glint, it achieved an F1-score of 0.918. Ablation studies validate the synergistic effect of the proposed components, with the full model achieving a 4.9% F1-score gain over the baseline. Significantly, this work also reveals a potential link between the contrast inversion angle and the oil’s refractive index, suggesting a novel avenue for remotely characterizing the physical properties of oil spills. This research provides a robust framework for automated oil spill monitoring, offering a reliable model for environmental protection and emergency response.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 48-61"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145657554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-satellite hierarchical multimodal denoising for hyperspectral imagery 高光谱图像的跨卫星分层多模态去噪
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-04 DOI: 10.1016/j.isprsjprs.2025.11.028
Minghua Wang , Bo Shang , Yilun Li , Xin Zhao , Lianru Gao , Xu Sun , Longfei Ren
The rapid proliferation of remote sensing satellite constellations has ushered in an era of unprecedented access to hyperspectral (HS) and multispectral (MS) Earth observation data. HS denoising remains a critical research focus for enhancing the interpretation and application of satellite data. However, on the one hand, the available characteristics of noisy HS satellite data are often limited under real-world and unknown conditions. On the other hand, existing works ignore the complementary advantages offered by MS satellite imagery that can be employed to enhance HS denoising performance. Leveraging the synergistic potential of HS and MS data, this study pioneers a novel paradigm for the joint exploitation and optimization of multi-source remote sensing satellites. A novel Hierarchical Dual Tucker Decomposition (DTucker) framework is proposed to capitalize on the low-rank (LR) tensor property of HS and MS cross-satellite data. The inheritance of properties from the original tensor to the core tensor is explored through a manually designed model-driven constraint or a data-driven multilayer perceptron (MLP) framework. This enables robust integration of MS-derived spatial richness with HS details and significantly enhances the denoising capability. We construct HS–MS data pairs from real satellite observations, including Earth Observing-1, Sentinel-2, Gaofen-1, Gaofen-5, and Gaofen-6. Four datasets span diverse scenes such as runways, urban areas, rivers, and farmlands. The proposed method demonstrates strong generalization across various satellite combinations and application scenarios. Notably, the incorporation of MS data markedly enhances both class discrimination and structural detail in the denoised HS outputs, promoting the performance of subsequent classification and analysis tasks. The datasets and codes implemented in MATLAB and PYTHON will be available on the website: https://github.com/MinghuaWang123/DTucker, contributing to the remote sensing community.
遥感卫星星座的快速增长,开启了一个前所未有的获取高光谱(HS)和多光谱(MS)地球观测数据的时代。高噪降噪是提高卫星数据解译和应用的一个重要研究热点。然而,一方面,在现实世界和未知条件下,高噪声卫星数据的可用特性往往受到限制。另一方面,现有的工作忽略了MS卫星图像可以用来增强HS去噪性能的互补优势。利用HS和MS数据的协同潜力,本研究开创了多源遥感卫星联合开发和优化的新范式。提出了一种新的分层对偶塔克分解(DTucker)框架,利用HS和MS交叉卫星数据的低秩张量特性。通过人工设计的模型驱动约束或数据驱动的多层感知器(MLP)框架来探索从原始张量到核心张量的属性继承。这使得ms衍生的空间丰富度与HS细节的鲁棒性集成,并显着增强了去噪能力。我们利用地球观测1号、哨兵2号、高分1号、高分5号和高分6号等实际卫星观测数据构建了HS-MS数据对。四个数据集涵盖了不同的场景,如跑道、城市地区、河流和农田。该方法具有较强的通用性,适用于各种卫星组合和应用场景。值得注意的是,MS数据的合并显著增强了去噪HS输出中的类别区分和结构细节,从而提高了后续分类和分析任务的性能。用MATLAB和PYTHON实现的数据集和代码将在网站上提供:https://github.com/MinghuaWang123/DTucker,为遥感社区做出贡献。
{"title":"Cross-satellite hierarchical multimodal denoising for hyperspectral imagery","authors":"Minghua Wang ,&nbsp;Bo Shang ,&nbsp;Yilun Li ,&nbsp;Xin Zhao ,&nbsp;Lianru Gao ,&nbsp;Xu Sun ,&nbsp;Longfei Ren","doi":"10.1016/j.isprsjprs.2025.11.028","DOIUrl":"10.1016/j.isprsjprs.2025.11.028","url":null,"abstract":"<div><div>The rapid proliferation of remote sensing satellite constellations has ushered in an era of unprecedented access to hyperspectral (HS) and multispectral (MS) Earth observation data. HS denoising remains a critical research focus for enhancing the interpretation and application of satellite data. However, on the one hand, the available characteristics of noisy HS satellite data are often limited under real-world and unknown conditions. On the other hand, existing works ignore the complementary advantages offered by MS satellite imagery that can be employed to enhance HS denoising performance. Leveraging the synergistic potential of HS and MS data, this study pioneers a novel paradigm for the joint exploitation and optimization of multi-source remote sensing satellites. A novel Hierarchical Dual Tucker Decomposition (DTucker) framework is proposed to capitalize on the low-rank (LR) tensor property of HS and MS cross-satellite data. The inheritance of properties from the original tensor to the core tensor is explored through a manually designed model-driven constraint or a data-driven multilayer perceptron (MLP) framework. This enables robust integration of MS-derived spatial richness with HS details and significantly enhances the denoising capability. We construct HS–MS data pairs from real satellite observations, including Earth Observing-1, Sentinel-2, Gaofen-1, Gaofen-5, and Gaofen-6. Four datasets span diverse scenes such as runways, urban areas, rivers, and farmlands. The proposed method demonstrates strong generalization across various satellite combinations and application scenarios. Notably, the incorporation of MS data markedly enhances both class discrimination and structural detail in the denoised HS outputs, promoting the performance of subsequent classification and analysis tasks. The datasets and codes implemented in MATLAB and PYTHON will be available on the website: <span><span>https://github.com/MinghuaWang123/DTucker</span><svg><path></path></svg></span>, contributing to the remote sensing community.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 94-108"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extrapolate azimuth angles: Text and edge guided ISAR image generation based on foundation model 外推方位角:基于基础模型的文本和边缘引导ISAR图像生成
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-04 DOI: 10.1016/j.isprsjprs.2025.12.002
Jiawei Zhang, Xiaolin Zhou, Weidong Jiang, Xiaolong Su, Zhen Liu, Li Liu
Inverse Synthetic Aperture Radar (ISAR) has been widely applied in remote sensing and space target monitoring. Automatic Target Recognition (ATR) based on ISAR imagery plays a critical role in target interpretation and pose estimation. With the growing adoption of intelligent methods in the ATR domain, the quantity and quality of ISAR data have become decisive factors influencing algorithm performance. However, due to the complexity of ISAR imaging algorithms and the high cost of data acquisition, high-quality ISAR image datasets remain extremely scarce. As a result, learning the underlying characteristics of existing ISAR data to generate large-scale usable samples has become a pressing research focus. Although some preliminary studies have explored ISAR image data augmentation, most of them rely on image sequence interpolation or conditional generation, both of which exhibit critical limitations: the former requires densely sampled image sequences with small angular intervals, while the latter can only model the mapping between limited azimuth conditions and ISAR images. Neither approach is capable of generating images of new targets under unseen azimuth conditions, resulting in poor generalization and leaving substantial room for further exploration. To address these limitations, we formally define a novel research problem, termed ISAR azimuth angle extrapolation. This task fundamentally involves high-dimensional, structured, cross-view image synthesis, requiring the restoration of visual details while ensuring physical consistency and structural stability. To address this problem, we propose ISAR-ExtraNet, a foundation-model-based framework for ISAR azimuth angle extrapolation. ISAR-ExtraNet leverages the strong representation, modeling, and generalization capabilities of pretrained foundation models to generate ISAR images of new targets under novel azimuth conditions. Specifically, the model employs a two-stage coarse-to-fine fine-tuning strategy, incorporating optical image contours and scattering center distribution constraints to guide the generation process. This design enhances both semantic alignment and structural fidelity in the generated ISAR images. Comprehensive experiments demonstrate that ISAR-ExtraNet significantly outperforms baseline methods and fine-tuned foundation models, achieving 28.76 dB in PSNR and 0.80 in SSIM. We hope that the training paradigm introduced in ISAR-ExtraNet will inspire further exploration of the ISAR azimuth extrapolation problem and foster progress in this emerging research area.
逆合成孔径雷达(ISAR)在遥感和空间目标监测中得到了广泛的应用。基于ISAR图像的自动目标识别(ATR)在目标判读和姿态估计中起着至关重要的作用。随着智能方法在ATR领域的应用越来越广泛,ISAR数据的数量和质量已经成为影响算法性能的决定性因素。然而,由于ISAR成像算法的复杂性和数据采集的高成本,高质量的ISAR图像数据集仍然非常稀缺。因此,学习现有ISAR数据的潜在特征以生成大规模可用样本已成为一个紧迫的研究重点。虽然有一些初步的研究探索了ISAR图像数据的增强,但大多数研究都依赖于图像序列插值或条件生成,这两种方法都存在严重的局限性:前者需要小角度间隔的密集采样图像序列,而后者只能模拟有限方位角条件与ISAR图像之间的映射。这两种方法都不能在不可见的方位角条件下生成新目标的图像,导致泛化效果差,为进一步的探索留下了很大的空间。为了解决这些限制,我们正式定义了一个新的研究问题,称为ISAR方位角外推。该任务从根本上涉及高维、结构化、交叉视图图像合成,要求在保证物理一致性和结构稳定性的同时恢复视觉细节。为了解决这个问题,我们提出了ISAR- extranet,这是一个基于基础模型的ISAR方位角外推框架。ISAR- extranet利用预训练基础模型的强大表示、建模和泛化能力,在新的方位角条件下生成新目标的ISAR图像。具体而言,该模型采用两阶段粗到精的微调策略,结合光学图像轮廓和散射中心分布约束来指导生成过程。这种设计增强了生成的ISAR图像的语义一致性和结构保真度。综合实验表明,ISAR-ExtraNet显著优于基线方法和微调基础模型,PSNR达到28.76 dB, SSIM达到0.80。我们希望在ISAR- extranet中引入的训练范式将激发对ISAR方位外推问题的进一步探索,并促进这一新兴研究领域的进展。
{"title":"Extrapolate azimuth angles: Text and edge guided ISAR image generation based on foundation model","authors":"Jiawei Zhang,&nbsp;Xiaolin Zhou,&nbsp;Weidong Jiang,&nbsp;Xiaolong Su,&nbsp;Zhen Liu,&nbsp;Li Liu","doi":"10.1016/j.isprsjprs.2025.12.002","DOIUrl":"10.1016/j.isprsjprs.2025.12.002","url":null,"abstract":"<div><div>Inverse Synthetic Aperture Radar (ISAR) has been widely applied in remote sensing and space target monitoring. Automatic Target Recognition (ATR) based on ISAR imagery plays a critical role in target interpretation and pose estimation. With the growing adoption of intelligent methods in the ATR domain, the quantity and quality of ISAR data have become decisive factors influencing algorithm performance. However, due to the complexity of ISAR imaging algorithms and the high cost of data acquisition, high-quality ISAR image datasets remain extremely scarce. As a result, learning the underlying characteristics of existing ISAR data to generate large-scale usable samples has become a pressing research focus. Although some preliminary studies have explored ISAR image data augmentation, most of them rely on image sequence interpolation or conditional generation, both of which exhibit critical limitations: the former requires densely sampled image sequences with small angular intervals, while the latter can only model the mapping between limited azimuth conditions and ISAR images. Neither approach is capable of generating images of new targets under unseen azimuth conditions, resulting in poor generalization and leaving substantial room for further exploration. To address these limitations, we formally define a novel research problem, termed ISAR azimuth angle extrapolation. This task fundamentally involves high-dimensional, structured, cross-view image synthesis, requiring the restoration of visual details while ensuring physical consistency and structural stability. To address this problem, we propose ISAR-ExtraNet, a foundation-model-based framework for ISAR azimuth angle extrapolation. ISAR-ExtraNet leverages the strong representation, modeling, and generalization capabilities of pretrained foundation models to generate ISAR images of new targets under novel azimuth conditions. Specifically, the model employs a two-stage coarse-to-fine fine-tuning strategy, incorporating optical image contours and scattering center distribution constraints to guide the generation process. This design enhances both semantic alignment and structural fidelity in the generated ISAR images. Comprehensive experiments demonstrate that ISAR-ExtraNet significantly outperforms baseline methods and fine-tuned foundation models, achieving 28.76 dB in PSNR and 0.80 in SSIM. We hope that the training paradigm introduced in ISAR-ExtraNet will inspire further exploration of the ISAR azimuth extrapolation problem and foster progress in this emerging research area.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 109-123"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CityVLM: Towards sustainable urban development via multi-view coordinated vision–language model CityVLM:通过多视角协调视觉语言模型实现城市可持续发展
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-03 DOI: 10.1016/j.isprsjprs.2025.11.030
Junjue Wang , Weihao Xuan , Heli Qi , Zihang Chen , Hongruixuan Chen , Zhuo Zheng , Junshi Xia , Yanfei Zhong , Naoto Yokoya
Vision–language models (VLMs) have shown remarkable promise in Earth Vision, particularly in providing human-interpretable analysis of remote sensing imagery. While existing VLMs excel at general visual perception tasks, they often fall short in addressing the complex needs of geoscience, which requires comprehensive urban analysis across geographical, social, and economic dimensions. To bridge this gap, we expand VLM capabilities to tackle sustainable urban development challenges by integrating two complementary sources: remote sensing (RS) and street-view (SV) imagery. Specifically, we first design a multi-view vision–language dataset (CitySet), comprising 20,589 RS images, 1.1 million SV images, and 0.8 million question–answer pairs. CitySet facilitates geospatial object reasoning, social object analysis, urban economic assessment, and sustainable development report generation. Besides, we develop CityVLM to integrate macro- and micro-level semantics using geospatial and temporal modeling, while its language modeling component generates detailed urban reports. We extensively benchmarked 10 advanced VLMs on our dataset, revealing that state-of-the-art models struggle with urban analysis tasks, primarily due to domain gaps and limited multi-view data alignment capabilities. By addressing these issues, CityVLM achieves superior performance consistently across all tasks and advances automated urban analysis through practical applications like heat island effect monitoring, offering valuable tools for city planners and policymakers in their sustainability efforts.
视觉语言模型(VLMs)在地球视觉中表现出了显著的前景,特别是在提供人类可解释的遥感图像分析方面。虽然现有的vlm在一般的视觉感知任务上表现出色,但它们往往无法满足地球科学的复杂需求,因为地球科学需要跨地理、社会和经济维度的综合城市分析。为了弥补这一差距,我们通过整合两个互补的来源:遥感(RS)和街景(SV)图像,扩大VLM的能力,以应对可持续城市发展的挑战。具体来说,我们首先设计了一个多视图视觉语言数据集(CitySet),其中包括20,589张RS图像,110万张SV图像和80万对问答。CitySet促进地理空间对象推理、社会对象分析、城市经济评估和可持续发展报告生成。此外,我们开发了CityVLM,通过地理空间和时间建模来整合宏观和微观层面的语义,而其语言建模组件生成详细的城市报告。我们在数据集上对10个先进的vlm进行了广泛的基准测试,发现最先进的模型难以完成城市分析任务,主要原因是领域差距和有限的多视图数据对齐能力。通过解决这些问题,CityVLM在所有任务中都取得了卓越的表现,并通过热岛效应监测等实际应用推进了自动化城市分析,为城市规划者和政策制定者的可持续发展工作提供了宝贵的工具。
{"title":"CityVLM: Towards sustainable urban development via multi-view coordinated vision–language model","authors":"Junjue Wang ,&nbsp;Weihao Xuan ,&nbsp;Heli Qi ,&nbsp;Zihang Chen ,&nbsp;Hongruixuan Chen ,&nbsp;Zhuo Zheng ,&nbsp;Junshi Xia ,&nbsp;Yanfei Zhong ,&nbsp;Naoto Yokoya","doi":"10.1016/j.isprsjprs.2025.11.030","DOIUrl":"10.1016/j.isprsjprs.2025.11.030","url":null,"abstract":"<div><div>Vision–language models (VLMs) have shown remarkable promise in Earth Vision, particularly in providing human-interpretable analysis of remote sensing imagery. While existing VLMs excel at general visual perception tasks, they often fall short in addressing the complex needs of geoscience, which requires comprehensive urban analysis across geographical, social, and economic dimensions. To bridge this gap, we expand VLM capabilities to tackle sustainable urban development challenges by integrating two complementary sources: remote sensing (RS) and street-view (SV) imagery. Specifically, we first design a multi-view vision–language dataset (<em>CitySet</em>), comprising 20,589 RS images, 1.1 million SV images, and 0.8 million question–answer pairs. <em>CitySet</em> facilitates geospatial object reasoning, social object analysis, urban economic assessment, and sustainable development report generation. Besides, we develop <em>CityVLM</em> to integrate macro- and micro-level semantics using geospatial and temporal modeling, while its language modeling component generates detailed urban reports. We extensively benchmarked 10 advanced VLMs on our dataset, revealing that state-of-the-art models struggle with urban analysis tasks, primarily due to domain gaps and limited multi-view data alignment capabilities. By addressing these issues, <em>CityVLM</em> achieves superior performance consistently across all tasks and advances automated urban analysis through practical applications like heat island effect monitoring, offering valuable tools for city planners and policymakers in their sustainability efforts.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 62-74"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMS-Former: Adaptive multi-scale transformer for multi-modal image matching AMS-Former:用于多模态图像匹配的自适应多尺度变压器
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.021
Jiahao Rao , Rui Liu , Jianjun Guan , Xin Tian
Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: https://github.com/Henryrjh/AMS_Former.
多模态图像匹配在多源图像信息融合中起着至关重要的作用。然而,由于MMI在几何和模态上的显著差异,现有的方法往往不能达到令人满意的匹配性能。为了解决这些挑战,我们提出了一种端到端的MMI匹配方法,称为自适应多尺度变压器(AMS-Former)。首先,AMS-Former构建多尺度图像匹配框架,整合不同尺度的上下文信息,有效识别潜在的对应点,从而提高匹配精度。为了解决模态差异带来的挑战,我们设计了一个具有自适应调制策略的跨模态特征提取模块。该模块有效耦合了不同模态的特征,增强了特征表征,提高了模型在复杂模态差异下的鲁棒性。为了进一步提高匹配性能,我们为所提出的AMS-Former设计了合适的损失函数来指导网络参数的优化。最后,采用跨尺度互监督策略去除不正确的对应点,提高匹配结果的可靠性。在5个MMI数据集上进行的大量实验表明,AMS-Former优于最先进的方法,包括RIFT、ASS、comom、POS-GIFT、Matchformer、SEMLA、TopicFM和Lightglue。我们的代码可在:https://github.com/Henryrjh/AMS_Former。
{"title":"AMS-Former: Adaptive multi-scale transformer for multi-modal image matching","authors":"Jiahao Rao ,&nbsp;Rui Liu ,&nbsp;Jianjun Guan ,&nbsp;Xin Tian","doi":"10.1016/j.isprsjprs.2026.01.021","DOIUrl":"10.1016/j.isprsjprs.2026.01.021","url":null,"abstract":"<div><div>Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: <span><span>https://github.com/Henryrjh/AMS_Former</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 957-973"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TFRSUB: A terrain-feature retention and spatial uniformity balancing method for simplifying LiDAR ground point clouds TFRSUB:一种简化激光雷达地面点云的地形特征保持和空间均匀性平衡方法
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-02-01 Epub Date: 2025-12-29 DOI: 10.1016/j.isprsjprs.2025.12.015
Chuanfa Chen, Ziming Yang, Hongming Pan, Yanyan Li, Jinda Hao
The processing of high-density LiDAR point clouds presents significant computational challenges for DEM generation due to data redundancy and topographic feature degradation during simplification. To overcome this problem, this study proposes a Terrain-Feature Retention and Spatial Uniformity Balancing (TFRSUB) method that integrates three key innovations: (i) a Distance-Geometric Synergy Index (DGSI) combining orthogonal deviation distance and point sampling interval to mitigate boundary contraction artifacts; (ii) a Composite Terrain Factor (CTF) synthesizing multiple terrain parameters to characterize diverse topographic features; and (iii) a cluster-driven Gaussian Process Regression (GPR) framework using CTF for iterative feature point selection, optimizing the trade-off between topographic fidelity and point distribution homogeneity. Evaluated on eight high-resolution LiDAR terrain point clouds across six retention ratios, TFRSUB demonstrates significant accuracy improvements over seven state-of-the-art methods, achieving reductions of 9.22%–64.70% in DEM root mean square error, 7.80%–61.34% in mean absolute error, 16.43%–76.88% in slope error, and 28.12%–81.35% in mean curvature error. These results establish TFRSUB as an alternative solution for LiDAR point cloud simplification that maintains topographic fidelity while addressing computational storage challenges.
高密度LiDAR点云的处理由于数据冗余和简化过程中的地形特征退化,给DEM生成带来了巨大的计算挑战。为了克服这一问题,本研究提出了一种地形特征保持和空间均匀性平衡(TFRSUB)方法,该方法集成了三个关键创新:(i)结合正交偏差距离和点采样间隔的距离几何协同指数(DGSI)来减轻边界收缩伪影;(ii)综合多个地形参数以表征不同地形特征的复合地形因子(CTF);(iii)使用CTF进行迭代特征点选择的聚类驱动高斯过程回归(GPR)框架,优化了地形保真度和点分布均匀性之间的权衡。通过对8个高分辨率LiDAR地形点云的6种保持比进行评估,TFRSUB比7种最先进的方法具有显著的精度提高,DEM均方根误差降低9.22% ~ 64.70%,平均绝对误差降低7.80% ~ 61.34%,坡度误差降低16.43% ~ 76.88%,平均曲率误差降低28.12% ~ 81.35%。这些结果表明,TFRSUB是激光雷达点云简化的替代解决方案,在解决计算存储挑战的同时保持地形保真度。
{"title":"TFRSUB: A terrain-feature retention and spatial uniformity balancing method for simplifying LiDAR ground point clouds","authors":"Chuanfa Chen,&nbsp;Ziming Yang,&nbsp;Hongming Pan,&nbsp;Yanyan Li,&nbsp;Jinda Hao","doi":"10.1016/j.isprsjprs.2025.12.015","DOIUrl":"10.1016/j.isprsjprs.2025.12.015","url":null,"abstract":"<div><div>The processing of high-density LiDAR point clouds presents significant computational challenges for DEM generation due to data redundancy and topographic feature degradation during simplification. To overcome this problem, this study proposes a Terrain-Feature Retention and Spatial Uniformity Balancing (TFRSUB) method that integrates three key innovations: (i) a Distance-Geometric Synergy Index (DGSI) combining orthogonal deviation distance and point sampling interval to mitigate boundary contraction artifacts; (ii) a Composite Terrain Factor (CTF) synthesizing multiple terrain parameters to characterize diverse topographic features; and (iii) a cluster-driven Gaussian Process Regression (GPR) framework using CTF for iterative feature point selection, optimizing the trade-off between topographic fidelity and point distribution homogeneity. Evaluated on eight high-resolution LiDAR terrain point clouds across six retention ratios, TFRSUB demonstrates significant accuracy improvements over seven state-of-the-art methods, achieving reductions of 9.22%–64.70% in DEM root mean square error, 7.80%–61.34% in mean absolute error, 16.43%–76.88% in slope error, and 28.12%–81.35% in mean curvature error. These results establish TFRSUB as an alternative solution for LiDAR point cloud simplification that maintains topographic fidelity while addressing computational storage challenges.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 389-407"},"PeriodicalIF":12.2,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1