首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
VectorLLM: Human-like extraction of structured building contours via multimodal LLMs VectorLLM:通过多模态llm提取结构化建筑轮廓
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.025
Tao Zhang , Shiqing Wei , Shihao Chen , Wenling Yu , Muying Luo , Shunping Ji
Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at https://github.com/zhang-tao-whu/VectorLLM.
从遥感影像中自动提取矢量化建筑轮廓对于城市规划、人口估计和灾害评估至关重要。目前最先进的方法依赖于复杂的多阶段管道,包括像素分割,矢量化和多边形细化,这限制了它们的可扩展性和现实世界的适用性。受大型语言模型(llm)卓越推理能力的启发,我们引入了VectorLLM,这是第一个设计用于从遥感图像中提取常规建筑物轮廓的多模态大型语言模型(MLLM)。与现有方法不同,VectorLLM直接对建筑轮廓进行逐角回归,模仿人类注释者的标记过程。我们的架构包括一个视觉基础主干、一个MLP连接器和一个LLM,通过可学习的位置嵌入来增强空间理解能力。通过对训练策略的全面探索,包括预训练、监督微调和跨WHU、WHU- mix和CrowdAI数据集的直接偏好优化,VectorLLM优于以前的SOTA方法。值得注意的是,VectorLLM在飞机、水体和油箱等看不见的物体上表现出强大的零射击性能,突出了其在各种遥感物体轮廓提取任务的统一建模方面的潜力。总的来说,这项工作为遥感矢量提取建立了一个新的范例,利用llm的拓扑推理能力来实现高精度和卓越的泛化。所有代码和权重都可以在https://github.com/zhang-tao-whu/VectorLLM上获得。
{"title":"VectorLLM: Human-like extraction of structured building contours via multimodal LLMs","authors":"Tao Zhang ,&nbsp;Shiqing Wei ,&nbsp;Shihao Chen ,&nbsp;Wenling Yu ,&nbsp;Muying Luo ,&nbsp;Shunping Ji","doi":"10.1016/j.isprsjprs.2026.01.025","DOIUrl":"10.1016/j.isprsjprs.2026.01.025","url":null,"abstract":"<div><div>Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at <span><span>https://github.com/zhang-tao-whu/VectorLLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 55-68"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMS-Former: Adaptive multi-scale transformer for multi-modal image matching AMS-Former:用于多模态图像匹配的自适应多尺度变压器
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.021
Jiahao Rao , Rui Liu , Jianjun Guan , Xin Tian
Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: https://github.com/Henryrjh/AMS_Former.
多模态图像匹配在多源图像信息融合中起着至关重要的作用。然而,由于MMI在几何和模态上的显著差异,现有的方法往往不能达到令人满意的匹配性能。为了解决这些挑战,我们提出了一种端到端的MMI匹配方法,称为自适应多尺度变压器(AMS-Former)。首先,AMS-Former构建多尺度图像匹配框架,整合不同尺度的上下文信息,有效识别潜在的对应点,从而提高匹配精度。为了解决模态差异带来的挑战,我们设计了一个具有自适应调制策略的跨模态特征提取模块。该模块有效耦合了不同模态的特征,增强了特征表征,提高了模型在复杂模态差异下的鲁棒性。为了进一步提高匹配性能,我们为所提出的AMS-Former设计了合适的损失函数来指导网络参数的优化。最后,采用跨尺度互监督策略去除不正确的对应点,提高匹配结果的可靠性。在5个MMI数据集上进行的大量实验表明,AMS-Former优于最先进的方法,包括RIFT、ASS、comom、POS-GIFT、Matchformer、SEMLA、TopicFM和Lightglue。我们的代码可在:https://github.com/Henryrjh/AMS_Former。
{"title":"AMS-Former: Adaptive multi-scale transformer for multi-modal image matching","authors":"Jiahao Rao ,&nbsp;Rui Liu ,&nbsp;Jianjun Guan ,&nbsp;Xin Tian","doi":"10.1016/j.isprsjprs.2026.01.021","DOIUrl":"10.1016/j.isprsjprs.2026.01.021","url":null,"abstract":"<div><div>Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: <span><span>https://github.com/Henryrjh/AMS_Former</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 957-973"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WEGLA-NormGAN: wavelet-enhanced Cycle-GAN with global-local attention for radiometric normalization of remote sensing images WEGLA-NormGAN:基于全局-局部关注的遥感图像辐射归一化小波增强循环gan
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.020
Wenxia Gan , Yu Feng , Jianhao Miao , Xinghua Li , Huanfeng Shen
The diversity of satellite remote sensing images has significantly enhanced the capability to observe surface information on Earth. However, multi-temporal optical remote sensing images acquired from different sensor platforms often exhibit substantial radiometric discrepancies, and it is difficult to obtain overlapping reference images, which poses critical challenges for seamless large-scale mosaicking, including global radiometric inconsistency, unsmooth local transitions, and visible seamlines. Existing traditional and deep learning methods can achieve reasonable performance on paired datasets, but often face challenges in balancing spatial structural integrity with enhanced radiometric consistency and generalizing to unseen images. To address these issues, a wavelet-enhanced radiometric normalization network called WEGLA-NormGAN is proposed to generate radiometrically normalized imagery with sound radiometric consistency and spatial fidelity. This framework integrates frequency-domain and spatial-domain information to achieve consistent multi-scale radiometric feature modeling while ensuring spatial structural fidelity. Firstly, wavelet transform is introduced to effectively decouple radiometric information and structural features from images, explicitly enhancing radiometric feature representation and edge-texture preservation. Secondly, a U-Net architecture with multi-scale modeling advantages is fused with an adaptive attention mechanism incorporating residual structures. This hybrid design employs a statistical alignment strategy to efficiently extract global shallow features and local statistical information, adaptively adjust the dynamic attention of unseen data, and alleviate local distortions, improving radiometric consistency and achieving high-fidelity spatial structure preservation. The proposed framework generates radiometrically normalized imagery that harmonizes radiometric consistency with spatial fidelity, while achieving outstanding radiometric normalization even in unseen scenarios. Extensive experiments were conducted on two public datasets and a self-constructed dataset. The results demonstrate that WEGLA-NormGAN outperforms seven state-of-the-art methods in cross-temporal scenarios and five in cross-spatiotemporal scenarios in terms of radiometric consistency, structural fidelity, and robustness. The code is available at https://github.com/WITRS/WeGLA-Norm.git.
卫星遥感影像的多样性大大提高了观测地球表面信息的能力。然而,从不同传感器平台获取的多时相光学遥感图像往往存在较大的辐射差异,并且难以获得重叠的参考图像,这对无缝大规模拼接提出了严峻的挑战,包括全局辐射不一致、局部过渡不平滑和接缝可见。现有的传统和深度学习方法可以在配对数据集上获得合理的性能,但在平衡空间结构完整性与增强辐射一致性以及推广到未见图像方面往往面临挑战。为了解决这些问题,提出了一种名为WEGLA-NormGAN的小波增强辐射归一化网络,以生成具有声音辐射一致性和空间保真度的辐射归一化图像。该框架集成了频域和空域信息,在保证空间结构保真度的同时实现一致的多尺度辐射特征建模。首先,引入小波变换,有效解耦图像中的辐射特征信息和结构特征,明显增强辐射特征表示和边缘纹理保持;其次,将具有多尺度建模优势的U-Net体系结构与包含残余结构的自适应注意机制相融合;该混合设计采用统计对齐策略,有效提取全局浅特征和局部统计信息,自适应调整未见数据的动态注意力,减轻局部失真,提高辐射一致性,实现高保真的空间结构保存。所提出的框架生成辐射标准化图像,使辐射一致性与空间保真度相协调,同时即使在看不见的场景中也能实现出色的辐射标准化。在两个公共数据集和一个自建数据集上进行了大量的实验。结果表明,在辐射一致性、结构保真度和鲁棒性方面,WEGLA-NormGAN在跨时间情景下优于7种最先进的方法,在跨时空情景下优于5种最先进的方法。代码可在https://github.com/WITRS/WeGLA-Norm.git上获得。
{"title":"WEGLA-NormGAN: wavelet-enhanced Cycle-GAN with global-local attention for radiometric normalization of remote sensing images","authors":"Wenxia Gan ,&nbsp;Yu Feng ,&nbsp;Jianhao Miao ,&nbsp;Xinghua Li ,&nbsp;Huanfeng Shen","doi":"10.1016/j.isprsjprs.2026.01.020","DOIUrl":"10.1016/j.isprsjprs.2026.01.020","url":null,"abstract":"<div><div>The diversity of satellite remote sensing images has significantly enhanced the capability to observe surface information on Earth. However, multi-temporal optical remote sensing images acquired from different sensor platforms often exhibit substantial radiometric discrepancies, and it is difficult to obtain overlapping reference images, which poses critical challenges for seamless large-scale mosaicking, including global radiometric inconsistency, unsmooth local transitions, and visible seamlines. Existing traditional and deep learning methods can achieve reasonable performance on paired datasets, but often face challenges in balancing spatial structural integrity with enhanced radiometric consistency and generalizing to unseen images. To address these issues, a wavelet-enhanced radiometric normalization network called WEGLA-NormGAN is proposed to generate radiometrically normalized imagery with sound radiometric consistency and spatial fidelity. This framework integrates frequency-domain and spatial-domain information to achieve consistent multi-scale radiometric feature modeling while ensuring spatial structural fidelity. Firstly, wavelet transform is introduced to effectively decouple radiometric information and structural features from images, explicitly enhancing radiometric feature representation and edge-texture preservation. Secondly, a U-Net architecture with multi-scale modeling advantages is fused with an adaptive attention mechanism incorporating residual structures. This hybrid design employs a statistical alignment strategy to efficiently extract global shallow features and local statistical information, adaptively adjust the dynamic attention of unseen data, and alleviate local distortions, improving radiometric consistency and achieving high-fidelity spatial structure preservation. The proposed framework generates radiometrically normalized imagery that harmonizes radiometric consistency with spatial fidelity, while achieving outstanding radiometric normalization even in unseen scenarios. Extensive experiments were conducted on two public datasets and a self-constructed dataset. The results demonstrate that WEGLA-NormGAN outperforms seven state-of-the-art methods in cross-temporal scenarios and five in cross-spatiotemporal scenarios in terms of radiometric consistency, structural fidelity, and robustness. The code is available at <span><span>https://github.com/WITRS/WeGLA-Norm.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 39-54"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attributing GHG emissions to individual facilities using multi-temporal hyperspectral images: Methodology and applications 利用多时相高光谱图像将温室气体排放归因于单个设施:方法和应用
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-17 DOI: 10.1016/j.isprsjprs.2026.01.014
Yichi Zhang , Ge Han , Yiyang Huang , Huayi Wang , Hongyuan Zhang , Zhipeng Pei , Yuanxue Pu , Haotian Luo , Jinchun Yi , Tianqi Shi , Siwei Li , Wei Gong
Industrial parks are major sources of greenhouse gas (GHG) emissions and the ultimate entities responsible for implementing mitigation policies. Current satellite remote sensing technologies perform well in reporting localized strong point-source emissions, but face significant challenges in monitoring emissions from multiple densely clustered sources. To address the limitation, we propose an emission allocation framework, EA-MILES, which integrates multi-source hyperspectral data with plume modeling to quantify process-level emissions. Simulation experiments show that with existing hyperspectral satellites, EA-MILES can estimate emissions for sources with intensities above 80 t CO2/h and 100 kg CH4/h with bias not exceed 13.60 % and 17.08 %. A steel and power production park is selected as a case study, where EA-MILES estimates process-level emissions with uncertainties ranging from 26.33 % to 37.78 %. Estimation results are consistent with inventory values derived from emission factor methods. Top-down Integrated Mass Enhancement method is utilized to compare with EA-MILES results, the estimation bias did not exceed 16.84 %. According to the Climate TRACE, about 32 % of CO2 and 44 % of CH4 point-sources worldwide fall within EA-MILES detection coverage, accounting for over 80 % and 55 % of anthropogenic CO2 and CH4 emissions. Therefore, this study provides a novel satellite-based approach for reporting facility-scale GHG emissions in industrial parks, offering transparent and accurate monitoring data to support the mitigation and energy transition decision-making.
工业园区是温室气体(GHG)排放的主要来源,也是负责实施减缓政策的最终实体。目前的卫星遥感技术在报告局部强点源排放方面表现良好,但在监测多个密集聚集源的排放方面面临重大挑战。为了解决这一限制,我们提出了一个排放分配框架EA-MILES,该框架将多源高光谱数据与羽流建模相结合,以量化过程级排放。模拟实验表明,利用现有的高光谱卫星,EA-MILES可以估算强度在80 t CO2/h和100 kg CH4/h以上的源的排放量,偏差不超过13.60%和17.08%。以某钢铁和电力生产园区为例,EA-MILES估算的过程级排放不确定性在26.33% ~ 37.78%之间。估算结果与排放因子法得出的库存值一致。采用自顶向下集成质量增强方法与EA-MILES结果进行比较,估计偏差不超过16.84%。根据Climate TRACE,全球约32%的CO2和44%的CH4点源在EA-MILES检测范围内,占人为CO2和CH4排放量的80%和55%以上。因此,本研究提供了一种新的基于卫星的方法来报告工业园区设施规模的温室气体排放,提供透明和准确的监测数据,以支持减缓和能源转型决策。
{"title":"Attributing GHG emissions to individual facilities using multi-temporal hyperspectral images: Methodology and applications","authors":"Yichi Zhang ,&nbsp;Ge Han ,&nbsp;Yiyang Huang ,&nbsp;Huayi Wang ,&nbsp;Hongyuan Zhang ,&nbsp;Zhipeng Pei ,&nbsp;Yuanxue Pu ,&nbsp;Haotian Luo ,&nbsp;Jinchun Yi ,&nbsp;Tianqi Shi ,&nbsp;Siwei Li ,&nbsp;Wei Gong","doi":"10.1016/j.isprsjprs.2026.01.014","DOIUrl":"10.1016/j.isprsjprs.2026.01.014","url":null,"abstract":"<div><div>Industrial parks are major sources of greenhouse gas (GHG) emissions and the ultimate entities responsible for implementing mitigation policies. Current satellite remote sensing technologies perform well in reporting localized strong point-source emissions, but face significant challenges in monitoring emissions from multiple densely clustered sources. To address the limitation, we propose an emission allocation framework, EA-MILES, which integrates multi-source hyperspectral data with plume modeling to quantify process-level emissions. Simulation experiments show that with existing hyperspectral satellites, EA-MILES can estimate emissions for sources with intensities above 80 t CO<sub>2</sub>/h and 100 kg CH<sub>4</sub>/h with bias not exceed 13.60 % and 17.08 %. A steel and power production park is selected as a case study, where EA-MILES estimates process-level emissions with uncertainties ranging from 26.33 % to 37.78 %. Estimation results are consistent with inventory values derived from emission factor methods. Top-down Integrated Mass Enhancement method is utilized to compare with EA-MILES results, the estimation bias did not exceed 16.84 %. According to the <em>Climate TRACE</em>, about 32 % of CO<sub>2</sub> and 44 % of CH<sub>4</sub> point-sources worldwide fall within EA-MILES detection coverage, accounting for over 80 % and 55 % of anthropogenic CO<sub>2</sub> and CH<sub>4</sub> emissions. Therefore, this study provides a novel satellite-based approach for reporting facility-scale GHG emissions in industrial parks, offering transparent and accurate monitoring data to support the mitigation and energy transition decision-making.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 937-956"},"PeriodicalIF":12.2,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BEDI: a comprehensive benchmark for evaluating embodied agents on UAVs BEDI:评估无人机上嵌入代理的综合基准
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-16 DOI: 10.1016/j.isprsjprs.2026.01.013
Mingning Guo , Mengwei Wu , Jiarun He, Shaoxian Li, Haifeng Li, Chao Tao
With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.
随着低空遥感技术和视觉语言模型的快速发展,基于无人机的具身智能体在自主任务中显示出巨大的潜力。然而,目前UAV-Embodied Agents (uav - ea)的评估方法仍然受到缺乏标准化基准、多样化测试场景和开放系统接口的限制。为了应对这些挑战,我们提出了BEDI(嵌入式无人机智能基准),这是一个用于评估无人机- ea的系统和标准化基准。具体来说,我们引入了一种新的基于感知-决策-行动循环的动态体现任务链范式,该范式将复杂的无人机任务分解为标准化、可测量的子任务。在此基础上,我们设计了一个统一的评估框架,包括六个核心子技能:语义感知、空间感知、运动控制、工具利用、任务规划和动作生成。此外,我们开发了一个混合测试平台,该平台结合了广泛的虚拟和现实场景,能够在不同环境下对uav - ea进行全面评估。该平台还提供开放和标准化的接口,允许研究人员自定义任务和扩展场景,从而增强评估过程的灵活性和可扩展性。最后,通过对几个最先进的(SOTA) vlm的实证评估,我们揭示了它们在具体无人机任务中的局限性,强调了BEDI基准在推进具体智能研究和模型优化方面的关键作用。BEDI填补了该领域在系统化、规范化评价方面的空白,有利于客观的模型比较,为该领域的未来发展奠定坚实的基础。我们的基准现在可以在https://github.com/lostwolves/BEDI上公开获得。
{"title":"BEDI: a comprehensive benchmark for evaluating embodied agents on UAVs","authors":"Mingning Guo ,&nbsp;Mengwei Wu ,&nbsp;Jiarun He,&nbsp;Shaoxian Li,&nbsp;Haifeng Li,&nbsp;Chao Tao","doi":"10.1016/j.isprsjprs.2026.01.013","DOIUrl":"10.1016/j.isprsjprs.2026.01.013","url":null,"abstract":"<div><div>With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that<!--> <!-->incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at <span><span>https://github.com/lostwolves/BEDI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 910-936"},"PeriodicalIF":12.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnchorReF: A novel anchor-based visual re-localization framework aided by multi-sensor data fusion 基于多传感器数据融合的基于锚点的视觉再定位框架
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-16 DOI: 10.1016/j.isprsjprs.2026.01.019
Hao Wu , Yu Ran , Xiaoxiang Zhang, Xinying Luo, Li Wang, Teng Zhao, Yongcheng Song, Zhijun Zhang, Huisong Zhang, Jin Liu, Jian Li
Visual relocalization estimates the precise pose of a query image within a pre-built visual map, serving as a fundamental component for robot navigation, autonomous driving, surveying and mapping, etc. In the past few decades, significant research efforts have been devoted to achieving high relocalization accuracy. However, challenges remain when the query images exhibit significant changes compared to the reference scene. This paper primarily addresses the problem of pose verification and correction of inaccurate pose estimations from the relocalization. We propose a novel anchor-based visual relocalization framework that achieves robust pose estimations through multi-view co-visibility verification. Our approach further utilizes a tightly-coupled multi-sensor data fusion for pose refinement. Comprehensive evaluations on large-scale, real-world urban driving datasets (containing frequent dynamic objects, severe occlusions, and long-term environmental changes) demonstrate that our framework achieves state-of-the-art performance. Specifically, compared to traditional SFM-based and Transformer-based methods under these challenging conditions, our approach reduces the translation error by 46.2% and the rotation error by 8.55%.
视觉重定位是在预先构建的视觉地图中估计查询图像的精确姿态,是机器人导航、自动驾驶、测绘等的基本组成部分。在过去的几十年里,大量的研究工作致力于实现高的再定位精度。然而,与参考场景相比,当查询图像显示出显著变化时,挑战仍然存在。本文主要解决了姿态验证问题和姿态估计不准确的修正问题。提出了一种新的基于锚点的视觉定位框架,该框架通过多视图共可视性验证实现鲁棒姿态估计。我们的方法进一步利用紧密耦合的多传感器数据融合来进行姿态优化。对大规模、真实的城市驾驶数据集(包含频繁的动态物体、严重的闭塞和长期的环境变化)的综合评估表明,我们的框架达到了最先进的性能。具体来说,在这些具有挑战性的条件下,与传统的基于sfm和基于transformer的方法相比,我们的方法将平移误差降低了46.2%,旋转误差降低了8.55%。
{"title":"AnchorReF: A novel anchor-based visual re-localization framework aided by multi-sensor data fusion","authors":"Hao Wu ,&nbsp;Yu Ran ,&nbsp;Xiaoxiang Zhang,&nbsp;Xinying Luo,&nbsp;Li Wang,&nbsp;Teng Zhao,&nbsp;Yongcheng Song,&nbsp;Zhijun Zhang,&nbsp;Huisong Zhang,&nbsp;Jin Liu,&nbsp;Jian Li","doi":"10.1016/j.isprsjprs.2026.01.019","DOIUrl":"10.1016/j.isprsjprs.2026.01.019","url":null,"abstract":"<div><div>Visual relocalization estimates the precise pose of a query image within a pre-built visual map, serving as a fundamental component for robot navigation, autonomous driving, surveying and mapping, etc. In the past few decades, significant research efforts have been devoted to achieving high relocalization accuracy. However, challenges remain when the query images exhibit significant changes compared to the reference scene. This paper primarily addresses the problem of pose verification and correction of inaccurate pose estimations from the relocalization. We propose a novel anchor-based visual relocalization framework that achieves robust pose estimations through multi-view co-visibility verification. Our approach further utilizes a tightly-coupled multi-sensor data fusion for pose refinement. Comprehensive evaluations on large-scale, real-world urban driving datasets (containing frequent dynamic objects, severe occlusions, and long-term environmental changes) demonstrate that our framework achieves state-of-the-art performance. Specifically, compared to traditional SFM-based and Transformer-based methods under these challenging conditions, our approach reduces the translation error by 46.2% and the rotation error by 8.55%.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 1-13"},"PeriodicalIF":12.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RTPSeg: A multi-modality dataset for LiDAR point cloud semantic segmentation assisted with RGB-thermal images in autonomous driving RTPSeg:自动驾驶中辅助rgb热图像的激光雷达点云语义分割多模态数据集
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-16 DOI: 10.1016/j.isprsjprs.2026.01.008
Yifan Sun , Chenguang Dai , Wenke Li , Xinpu Liu , Yongqi Sun , Ye Zhang , Weijun Guan , Yongsheng Zhang , Yulan Guo , Hanyun Wang
LiDAR point cloud semantic segmentation is crucial for scene understanding in autonomous driving, yet the sparse and textureless characteristics of point clouds cause huge challenges for this task. To address this, numerous studies have explored to leverage the dense color and fine-grained texture from RGB images for multi-modality 3D semantic segmentation. Nevertheless, these methods still encounter certain limitations when facing complex scenarios, as RGB images degrade under poor lighting conditions. In contrast, thermal infrared (TIR) images can provide thermal radiation information of road objects and are robust to illumination change, offering complementary advantages to RGB images. Therefore, in this work we introduce RTPSeg, the first and only multi-modality dataset to simultaneously provide RGB and TIR images for point cloud semantic segmentation. RTPSeg includes over 3000 synchronized frames collected by RGB camera, infrared camera, and LiDAR, providing over 248M pointwise annotations for 18 semantic categories in autonomous driving, involving urban and village scenes during both daytime and nighttime. Based on RTPSeg, we also propose RTPSegNet, a baseline model for point cloud semantic segmentation jointly assisted with RGB and TIR images. Extensive experiments demonstrate that the RTPSeg dataset presents considerable challenges and opportunities to existing point cloud semantic segmentation approaches, and our RTPSegNet exhibits promising effectiveness in jointly leveraging the complementary information between point clouds, RGB images, and TIR images. More importantly, the experimental results also confirm that 3D semantic segmentation can be effectively enhanced by introducing additional TIR image modality, revealing the promising potential of this innovative research and application. We anticipate that the RTPSeg will catalyze in-depth research in this field. Both RTPSeg and RTPSegNet will be released at https://github.com/sssssyf/RTPSeg
激光雷达点云语义分割对于自动驾驶场景理解至关重要,但点云的稀疏性和无纹理性给这一任务带来了巨大的挑战。为了解决这个问题,许多研究探索了利用RGB图像的密集颜色和细粒度纹理进行多模态3D语义分割。然而,这些方法在面对复杂场景时仍然会遇到一定的局限性,因为RGB图像在较差的光照条件下会退化。相比之下,热红外(TIR)图像可以提供道路物体的热辐射信息,并且对光照变化具有鲁棒性,是RGB图像的互补优势。因此,在这项工作中,我们引入了RTPSeg,这是第一个也是唯一一个同时提供RGB和TIR图像用于点云语义分割的多模态数据集。RTPSeg包括由RGB相机、红外相机和激光雷达采集的3000多帧同步帧,提供超过248M的自动驾驶18个语义类别的点向注释,包括白天和夜间的城市和乡村场景。在RTPSeg的基础上,提出了RTPSegNet——RGB和TIR图像联合辅助的点云语义分割基线模型。大量的实验表明,RTPSeg数据集对现有的点云语义分割方法提出了相当大的挑战和机遇,我们的RTPSegNet在联合利用点云、RGB图像和TIR图像之间的互补信息方面表现出了良好的效果。更重要的是,实验结果也证实了通过引入额外的TIR图像模态可以有效地增强三维语义分割,揭示了这一创新研究和应用的广阔潜力。我们期待RTPSeg将促进这一领域的深入研究。RTPSeg和rtpsenet都将在https://github.com/sssssyf/RTPSeg上发布
{"title":"RTPSeg: A multi-modality dataset for LiDAR point cloud semantic segmentation assisted with RGB-thermal images in autonomous driving","authors":"Yifan Sun ,&nbsp;Chenguang Dai ,&nbsp;Wenke Li ,&nbsp;Xinpu Liu ,&nbsp;Yongqi Sun ,&nbsp;Ye Zhang ,&nbsp;Weijun Guan ,&nbsp;Yongsheng Zhang ,&nbsp;Yulan Guo ,&nbsp;Hanyun Wang","doi":"10.1016/j.isprsjprs.2026.01.008","DOIUrl":"10.1016/j.isprsjprs.2026.01.008","url":null,"abstract":"<div><div>LiDAR point cloud semantic segmentation is crucial for scene understanding in autonomous driving, yet the sparse and textureless characteristics of point clouds cause huge challenges for this task. To address this, numerous studies have explored to leverage the dense color and fine-grained texture from RGB images for multi-modality 3D semantic segmentation. Nevertheless, these methods still encounter certain limitations when facing complex scenarios, as RGB images degrade under poor lighting conditions. In contrast, thermal infrared (TIR) images can provide thermal radiation information of road objects and are robust to illumination change, offering complementary advantages to RGB images. Therefore, in this work we introduce RTPSeg, the first and only multi-modality dataset to simultaneously provide RGB and TIR images for point cloud semantic segmentation. RTPSeg includes over 3000 synchronized frames collected by RGB camera, infrared camera, and LiDAR, providing over 248M pointwise annotations for 18 semantic categories in autonomous driving, involving urban and village scenes during both daytime and nighttime. Based on RTPSeg, we also propose RTPSegNet, a baseline model for point cloud semantic segmentation jointly assisted with RGB and TIR images. Extensive experiments demonstrate that the RTPSeg dataset presents considerable challenges and opportunities to existing point cloud semantic segmentation approaches, and our RTPSegNet exhibits promising effectiveness in jointly leveraging the complementary information between point clouds, RGB images, and TIR images. More importantly, the experimental results also confirm that 3D semantic segmentation can be effectively enhanced by introducing additional TIR image modality, revealing the promising potential of this innovative research and application. We anticipate that the RTPSeg will catalyze in-depth research in this field. Both RTPSeg and RTPSegNet will be released at <span><span>https://github.com/sssssyf/RTPSeg</span><svg><path></path></svg></span></div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 25-38"},"PeriodicalIF":12.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RECREATE: Supervised contrastive learning and inpainting based hyperspectral image denoising 再现:基于高光谱图像去噪的监督对比学习和图像修复
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-16 DOI: 10.1016/j.isprsjprs.2026.01.022
Aditya Dixit , Anup Kumar Gupta , Puneet Gupta , Ankur Garg
Hyperspectral image (HSI) contains information at various spectra, making it valuable in several real-world applications such as environmental monitoring, agriculture, and remote sensing. However, the acquisition process often introduces noise, necessitating effective HSI denoising methods to maintain its applicability. Deep Learning (DL) is considered as the de-facto for HSI denoising, but it requires a significant number of training samples to optimize network parameters for effective denoising outcomes. However, obtaining extensive datasets is challenging in HSI, leading to epistemic uncertainties and thereby deteriorating the denoising performance. This paper introduces a novel supervised contrastive learning (SCL) method, RECREATE, to enhance feature learning and mitigate the issue of epistemic uncertainty for HSI denoising. Furthermore, we introduce the exploration of image inpainting as an auxiliary task to enhance the HSI denoising performance. By adding HSI inpainting to CL, our method essentially enhances HSI denoising by increasing training datasets and enforcing improved feature learning. Experimental outcomes on various HSI datasets validate the efficacy of RECREATE, showcasing its potential for integration with existing HSI denoising techniques to enhance their performance, both qualitatively and quantitatively. This innovative method holds promise for addressing the limitations posed by limited training data and thereby advancing the field toward proposing better HSI denoising methods.
高光谱图像(HSI)包含各种光谱的信息,使其在环境监测,农业和遥感等几个现实世界的应用中具有价值。然而,采集过程往往会引入噪声,需要有效的HSI去噪方法来保持其适用性。深度学习(DL)被认为是HSI去噪的事实,但它需要大量的训练样本来优化网络参数以获得有效的去噪结果。然而,在HSI中获得广泛的数据集是具有挑战性的,导致认知不确定性,从而降低了去噪性能。本文介绍了一种新的监督对比学习(SCL)方法——recrere,以增强特征学习并减轻HSI去噪的认知不确定性问题。此外,我们还介绍了对图像着色的探索,作为增强HSI去噪性能的辅助任务。通过将HSI图像添加到CL中,我们的方法通过增加训练数据集和加强改进的特征学习,本质上增强了HSI去噪。在各种HSI数据集上的实验结果验证了re的有效性,展示了其与现有HSI去噪技术集成的潜力,以提高其定性和定量性能。这种创新的方法有望解决有限的训练数据所带来的限制,从而推动该领域提出更好的HSI去噪方法。
{"title":"RECREATE: Supervised contrastive learning and inpainting based hyperspectral image denoising","authors":"Aditya Dixit ,&nbsp;Anup Kumar Gupta ,&nbsp;Puneet Gupta ,&nbsp;Ankur Garg","doi":"10.1016/j.isprsjprs.2026.01.022","DOIUrl":"10.1016/j.isprsjprs.2026.01.022","url":null,"abstract":"<div><div>Hyperspectral image (HSI) contains information at various spectra, making it valuable in several real-world applications such as environmental monitoring, agriculture, and remote sensing. However, the acquisition process often introduces noise, necessitating effective HSI denoising methods to maintain its applicability. Deep Learning (DL) is considered as the de-facto for HSI denoising, but it requires a significant number of training samples to optimize network parameters for effective denoising outcomes. However, obtaining extensive datasets is challenging in HSI, leading to epistemic uncertainties and thereby deteriorating the denoising performance. This paper introduces a novel supervised contrastive learning (SCL) method, <em>RECREATE</em>, to enhance feature learning and mitigate the issue of epistemic uncertainty for HSI denoising. Furthermore, we introduce the exploration of image inpainting as an auxiliary task to enhance the HSI denoising performance. By adding HSI inpainting to CL, our method essentially enhances HSI denoising by increasing training datasets and enforcing improved feature learning. Experimental outcomes on various HSI datasets validate the efficacy of <em>RECREATE</em>, showcasing its potential for integration with existing HSI denoising techniques to enhance their performance, both qualitatively and quantitatively. This innovative method holds promise for addressing the limitations posed by limited training data and thereby advancing the field toward proposing better HSI denoising methods.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 14-24"},"PeriodicalIF":12.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal score-based federated generative learning for multi-source remote sensing time series in environmental monitoring 基于时空分数的多源遥感时间序列环境监测联邦生成学习
IF 12.7 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-15 DOI: 10.1016/j.isprsjprs.2025.11.001
Shagufta Henna, Upaka Rathnayake
{"title":"Spatiotemporal score-based federated generative learning for multi-source remote sensing time series in environmental monitoring","authors":"Shagufta Henna, Upaka Rathnayake","doi":"10.1016/j.isprsjprs.2025.11.001","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.11.001","url":null,"abstract":"","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"48 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An SW-TES hybrid algorithm for retrieving mountainous land surface temperature from high-resolution thermal infrared remote sensing data 基于高分辨率热红外遥感数据的山地地表温度反演SW-TES混合算法
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-15 DOI: 10.1016/j.isprsjprs.2026.01.016
Zhi-Wei He , Bo-Hui Tang , Zhao-Liang Li
<div><div>Mountainous land surface temperature (MLST) is a key parameter for studying the energy exchange between land surface and atmosphere in mountainous areas. However, traditional land surface temperature (LST) retrieval methods often neglect the influence of three-dimensional (3D) structures and adjacent pixels due to rugged terrain. To address this, a mountainous split-window and temperature-emissivity separation (MSW-TES) hybrid algorithm was proposed to retrieve MLST. The hybrid algorithm that combines the improved split window (SW) algorithm and temperature-emissivity separation (TES) algorithm, which considering the topographic and adjacent effects (T-A effect) to retrieve MLST from five thermal infrared (TIR) bands of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). In this hybrid algorithm, an improved mountainous canopy multiple scattering TIR radiative transfer model was proposed to construct the simulation dataset. Then, an improved SW algorithm was developed to build a 3D lookup table (LUT) of regression coefficients using small-scale self-heating parameter (SSP) and sky-view factor (SVF) to estimate brightness temperature (BT) at ground level. Furthermore, The TES algorithm was refined to account for the influence of rugged terrain within pixel on mountainous land surface effective emissivity (MLSE) by reconstructing the relationship between minimum emissivity and maximum-minimum difference (MMD) for different SSPs. Results from simulated data show that the accuracy of the improved SW algorithm is increased by up to 0.5 K at most for estimating BT at ground level. The MSW-TES algorithm, considering the T-A effect, generally retrieves lower LST values compared to those without this consideration. The hybrid algorithm yielded root mean square error (RMSE) of 0.99 K and 1.83 K for LST retrieval with and without the T-A effect, respectively, with most differences falling between 0.0 K and 3.0 K. The sensitivity analysis indicated that the perturbation of input parameters has little influence on MLST and MLSE, which proves that the MSW-TES algorithm has strong robustness. Additionally, the accuracy of MLST retrieval by the MSW-TES algorithm was validated using both discrete anisotropic radiative transfer (DART) model simulations and <em>in-situ</em> measurements. The validation result of DART simulations showed biases ranging from −0.13 K to 1.03 K and RMSEs from 0.76 K to 1.29 K across the five ASTER TIR bands, while validation result of the in-situ measurements yielded a bias of 0.97 K and an RMSE of 1.25 K, demonstrating consistent and reliable results. This study underscores the necessity of accounting for the T-A effect to improve MLST retrieval and provides a promising pathway for global clear-sky high-resolution MLST mapping in upcoming thermal missions. The source code and simulated data are available at <span><span>https://github.com/hezwppp/MSW-TES</span><svg><path></path></svg></span>.</div></div
山地地表温度(MLST)是研究山区地表与大气能量交换的关键参数。然而,由于地形起伏,传统的地表温度反演方法往往忽略了三维结构及其相邻像元的影响。为了解决这一问题,提出了一种山地分割窗和温度发射率分离(MSW-TES)混合算法来检索MLST。结合改进的分割窗(SW)算法和温度-发射率分离(TES)算法,考虑地形和相邻效应(T-A效应),从先进星载热发射与反射辐射计(ASTER)的5个热红外(TIR)波段提取MLST。在该混合算法中,提出了一种改进的山地冠层多重散射TIR辐射传输模型来构建模拟数据集。然后,开发了一种改进的SW算法,利用小尺度自热参数(SSP)和天景因子(SVF)建立回归系数的三维查找表(LUT),估算地面亮度温度(BT)。此外,通过重构不同ssp的最小发射率与最大最小差(MMD)之间的关系,对TES算法进行了改进,以考虑像元内崎岖地形对山地地表有效发射率(MLSE)的影响。仿真结果表明,改进的SW算法对地面BT的估计精度最高可提高0.5 K。考虑T-A效应的MSW-TES算法通常比不考虑T-A效应的算法检索到更低的LST值。混合算法对有无T-A效应的LST检索结果的均方根误差(RMSE)分别为0.99 K和1.83 K,最大差异在0.0 K和3.0 K之间。灵敏度分析表明,输入参数的扰动对MLST和MLSE的影响较小,证明了MSW-TES算法具有较强的鲁棒性。此外,通过离散各向异性辐射传输(DART)模型模拟和现场测量,验证了MSW-TES算法检索MLST的准确性。DART模拟验证结果显示,5个ASTER TIR波段的偏差范围为- 0.13 K ~ 1.03 K,均方根误差范围为0.76 K ~ 1.29 K,而原位测量验证结果的偏差为0.97 K,均方根误差为1.25 K,结果一致可靠。该研究强调了考虑T-A效应对改进MLST检索的必要性,并为未来热成像任务中全球晴空高分辨率MLST制图提供了一条有希望的途径。源代码和模拟数据可在https://github.com/hezwppp/MSW-TES上获得。
{"title":"An SW-TES hybrid algorithm for retrieving mountainous land surface temperature from high-resolution thermal infrared remote sensing data","authors":"Zhi-Wei He ,&nbsp;Bo-Hui Tang ,&nbsp;Zhao-Liang Li","doi":"10.1016/j.isprsjprs.2026.01.016","DOIUrl":"10.1016/j.isprsjprs.2026.01.016","url":null,"abstract":"&lt;div&gt;&lt;div&gt;Mountainous land surface temperature (MLST) is a key parameter for studying the energy exchange between land surface and atmosphere in mountainous areas. However, traditional land surface temperature (LST) retrieval methods often neglect the influence of three-dimensional (3D) structures and adjacent pixels due to rugged terrain. To address this, a mountainous split-window and temperature-emissivity separation (MSW-TES) hybrid algorithm was proposed to retrieve MLST. The hybrid algorithm that combines the improved split window (SW) algorithm and temperature-emissivity separation (TES) algorithm, which considering the topographic and adjacent effects (T-A effect) to retrieve MLST from five thermal infrared (TIR) bands of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). In this hybrid algorithm, an improved mountainous canopy multiple scattering TIR radiative transfer model was proposed to construct the simulation dataset. Then, an improved SW algorithm was developed to build a 3D lookup table (LUT) of regression coefficients using small-scale self-heating parameter (SSP) and sky-view factor (SVF) to estimate brightness temperature (BT) at ground level. Furthermore, The TES algorithm was refined to account for the influence of rugged terrain within pixel on mountainous land surface effective emissivity (MLSE) by reconstructing the relationship between minimum emissivity and maximum-minimum difference (MMD) for different SSPs. Results from simulated data show that the accuracy of the improved SW algorithm is increased by up to 0.5 K at most for estimating BT at ground level. The MSW-TES algorithm, considering the T-A effect, generally retrieves lower LST values compared to those without this consideration. The hybrid algorithm yielded root mean square error (RMSE) of 0.99 K and 1.83 K for LST retrieval with and without the T-A effect, respectively, with most differences falling between 0.0 K and 3.0 K. The sensitivity analysis indicated that the perturbation of input parameters has little influence on MLST and MLSE, which proves that the MSW-TES algorithm has strong robustness. Additionally, the accuracy of MLST retrieval by the MSW-TES algorithm was validated using both discrete anisotropic radiative transfer (DART) model simulations and &lt;em&gt;in-situ&lt;/em&gt; measurements. The validation result of DART simulations showed biases ranging from −0.13 K to 1.03 K and RMSEs from 0.76 K to 1.29 K across the five ASTER TIR bands, while validation result of the in-situ measurements yielded a bias of 0.97 K and an RMSE of 1.25 K, demonstrating consistent and reliable results. This study underscores the necessity of accounting for the T-A effect to improve MLST retrieval and provides a promising pathway for global clear-sky high-resolution MLST mapping in upcoming thermal missions. The source code and simulated data are available at &lt;span&gt;&lt;span&gt;https://github.com/hezwppp/MSW-TES&lt;/span&gt;&lt;svg&gt;&lt;path&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/span&gt;.&lt;/div&gt;&lt;/div","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 865-889"},"PeriodicalIF":12.2,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1