首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
PANet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detecting pseudo changes in cropland non-agriculturalization PANet:基于多尺度时间解耦网络及其高分辨率基准数据集的耕地非农化伪变化检测
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-22 DOI: 10.1016/j.isprsjprs.2026.01.029
Songman Sui , Jixian Zhang , Haiyan Gu , Yue Chang
Cropland non-agriculturalization (CNA) refers to the conversion of cropland into non-agricultural land such as construction land or ponds, posing threats to food security and ecological balance. Remote sensing technology enables precise monitoring of this process, but bi-temporal methods are susceptible to errors caused by seasonal spectral fluctuations, weather interference, and imaging discrepancies, often leading to false detections. Existing methods, which lack support from temporal datasets, struggle to disentangle the spectral confusion of gradual non-agriculturalization and short-term disturbances, thereby limiting the accuracy of dynamic cropland resource monitoring. To address this issue, a novel phenology-aware temporal change detection network (PANet) is proposed to solve the misclassification challenges in CNA detection caused by “same object with different spectra” and “different objects with similar spectra” issues. A phenology-aware module (PATM) is designed, leveraging a dual-driven decoupling model to dynamically weight phenology-sensitive periods and adaptively represent non-uniform temporal intervals. Through a time-aligned feature enhancement strategy and dual-driven (intra-annual/inter-annual) temporal decay functions, PANet simultaneously focuses on short-term anomalies and robustly models long-term trends. Additionally, a sample balance adjustment module (DFBL) is developed to mitigate the impact of sample imbalance by incorporating prior knowledge of changes and dynamic adjustment factors, enhancing the model’s sensitivity to non-agriculturalization changes. Furthermore, the first high-resolution CNA dataset based on actual production data is constructed, containing 1295 pairs of 512 × 512 masked images. Compared to existing datasets, this dataset offers extensive temporal coverage, capturing comprehensive seasonal periodic characteristics of cropland. Comparative experiments with several classical time-series methods and bi-temporal methods validate the effectiveness of PANet. Experimental results on the LHCD dataset demonstrate that PANet achieves the highest F1 score, specifically, 61.01% and 61.70%. PANet accurately captures CNA information, making it vital for the scientific management and sustainable utilization of limited cropland resources. The LHCD can be downloaded from https://github.com/mss-s/LHCD.
耕地非农化是指将耕地转化为建设用地或池塘等非农业用地,对粮食安全和生态平衡构成威胁。遥感技术能够对这一过程进行精确监测,但双时间方法容易受到季节光谱波动、天气干扰和成像差异造成的误差的影响,往往导致错误的检测。现有的方法缺乏时间数据集的支持,难以理清逐渐非农业化和短期扰动的光谱混淆,从而限制了动态耕地资源监测的准确性。针对这一问题,提出了一种新的物候感知时间变化检测网络(PANet),以解决CNA检测中由于“同一物体具有不同光谱”和“不同物体具有相似光谱”造成的误分类问题。设计了物候感知模块(PATM),利用双驱动解耦模型动态加权物候敏感期,并自适应表示非均匀时间间隔。通过时间同步特征增强策略和双驱动(年内/年际)时间衰减函数,PANet同时关注短期异常并稳健地模拟长期趋势。此外,通过引入变化的先验知识和动态调整因子,构建了样本平衡调整模块(DFBL)来缓解样本失衡的影响,增强了模型对非农业变化的敏感性。构建了第一个基于实际生产数据的高分辨率CNA数据集,包含1295对512 × 512的掩膜图像。与现有数据集相比,该数据集提供了广泛的时间覆盖,捕获了农田的全面季节性周期性特征。与几种经典时间序列方法和双时间方法的对比实验验证了PANet的有效性。在LHCD数据集上的实验结果表明,PANet的F1得分最高,分别为61.01%和61.70%。PANet准确捕获CNA信息,对有限耕地资源的科学管理和可持续利用至关重要。LHCD可从https://github.com/mss-s/LHCD下载。
{"title":"PANet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detecting pseudo changes in cropland non-agriculturalization","authors":"Songman Sui ,&nbsp;Jixian Zhang ,&nbsp;Haiyan Gu ,&nbsp;Yue Chang","doi":"10.1016/j.isprsjprs.2026.01.029","DOIUrl":"10.1016/j.isprsjprs.2026.01.029","url":null,"abstract":"<div><div>Cropland non-agriculturalization (CNA) refers to the conversion of cropland into non-agricultural land such as construction land or ponds, posing threats to food security and ecological balance. Remote sensing technology enables precise monitoring of this process, but bi-temporal methods are susceptible to errors caused by seasonal spectral fluctuations, weather interference, and imaging discrepancies, often leading to false detections. Existing methods, which lack support from temporal datasets, struggle to disentangle the spectral confusion of gradual non-agriculturalization and short-term disturbances, thereby limiting the accuracy of dynamic cropland resource monitoring. To address this issue, a novel phenology-aware temporal change detection network (PANet) is proposed to solve the misclassification challenges in CNA detection caused by “same object with different spectra” and “different objects with similar spectra” issues. A phenology-aware module (PATM) is designed, leveraging a dual-driven decoupling model to dynamically weight phenology-sensitive periods and adaptively represent non-uniform temporal intervals. Through a time-aligned feature enhancement strategy and dual-driven (intra-annual/inter-annual) temporal decay functions, PANet simultaneously focuses on short-term anomalies and robustly models long-term trends. Additionally, a sample balance adjustment module (DFBL) is developed to mitigate the impact of sample imbalance by incorporating prior knowledge of changes and dynamic adjustment factors, enhancing the model’s sensitivity to non-agriculturalization changes. Furthermore, the first high-resolution CNA dataset based on actual production data is constructed, containing 1295 pairs of 512 × 512 masked images. Compared to existing datasets, this dataset offers extensive temporal coverage, capturing comprehensive seasonal periodic characteristics of cropland. Comparative experiments with several classical time-series methods and bi-temporal methods validate the effectiveness of PANet. Experimental results on the LHCD dataset demonstrate that PANet achieves the highest F1 score, specifically, 61.01% and 61.70%. PANet accurately captures CNA information, making it vital for the scientific management and sustainable utilization of limited cropland resources. The LHCD can be downloaded from <span><span>https://github.com/mss-s/LHCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 126-143"},"PeriodicalIF":12.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting global ocean subsurface density change with high-resolution via dual-task densely-former 利用双任务密度仪高分辨率探测全球海洋地下密度变化
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-22 DOI: 10.1016/j.isprsjprs.2026.01.026
Hua Su , Weiqi Xie , Luping You , Sihui Li , Dian Lin , An Wang
High-resolution ocean subsurface density is crucial for studying dynamic processes and stratification within the ocean under recent global ocean warming. This study proposes a novel deep learning-based model, named DDFNet (Dual-task Densely-Former Network), for reconstructing ocean subsurface density, to address the challenges in reconstructing high-resolution and high-reliability global ocean subsurface density. DDFNet employs multi-scale feature extraction, attention mechanisms, and a dual-label design, combining an encoder-decoder backbone network with a global spatial attention module to capture the complex spatiotemporal relationships in ocean data effectively. The model utilizes multisource surface remote sensing data as input and incorporates Argo profile data and ORAS5 reanalysis data as labels. An adaptive weighted loss function dynamically balances the contributions of the two label types, improving reconstruction accuracy and achieving a spatial resolution of 0.25°×0.25°. By constructing dual tasks with in situ observations and reanalysis data for joint learning, the true state of the ocean and the consistency of physical processes are enhanced, improving the model’s reconstruction accuracy and physical consistency. Experimental results demonstrate that DDFNet outperforms well-used LightGBM and CNN models, with the reconstructed DDFNet-SD dataset achieving an R2 of 0.9863 and an RMSE of 0.2804 kg/m3. The dataset further reveals a declining trend in global ocean subsurface density at a rate of −4.47 × 10-4 kg/m3/decade, particularly pronounced in the upper 0–700 m, which is likely associated with global ocean warming and salinity changes. The high-resolution dataset facilitates studies on mesoscale ocean dynamics, stratification variability, and climate change impacts.
高分辨率的海洋地下密度对于研究近年来全球海洋变暖背景下海洋内部的动态过程和分层至关重要。本文提出了一种基于深度学习的海洋地下密度重建模型DDFNet (Dual-task dense - former Network),以解决重建高分辨率、高可靠性全球海洋地下密度的挑战。DDFNet采用多尺度特征提取、关注机制和双标签设计,将编码器-解码器骨干网与全局空间关注模块相结合,有效捕获海洋数据中复杂的时空关系。该模型采用多源地表遥感数据作为输入,并结合Argo剖面数据和ORAS5再分析数据作为标签。自适应加权损失函数动态平衡了两种标签类型的贡献,提高了重建精度,实现了0.25°×0.25°的空间分辨率。通过构建现场观测和再分析数据联合学习的双重任务,增强了海洋真实状态和物理过程的一致性,提高了模型的重建精度和物理一致性。实验结果表明,DDFNet优于常用的LightGBM和CNN模型,重建的DDFNet- sd数据集的R2为0.9863,RMSE为0.2804 kg/m3。数据集进一步揭示了全球海洋地下密度以- 4.47 × 10-4 kg/m3/ 10年的速率下降的趋势,特别是在0-700 m以上,这可能与全球海洋变暖和盐度变化有关。高分辨率数据集有助于中尺度海洋动力学、分层变率和气候变化影响的研究。
{"title":"Detecting global ocean subsurface density change with high-resolution via dual-task densely-former","authors":"Hua Su ,&nbsp;Weiqi Xie ,&nbsp;Luping You ,&nbsp;Sihui Li ,&nbsp;Dian Lin ,&nbsp;An Wang","doi":"10.1016/j.isprsjprs.2026.01.026","DOIUrl":"10.1016/j.isprsjprs.2026.01.026","url":null,"abstract":"<div><div>High-resolution ocean subsurface density is crucial for studying dynamic processes and stratification within the ocean under recent global ocean warming. This study proposes a novel deep learning-based model, named DDFNet (Dual-task Densely-Former Network), for reconstructing ocean subsurface density, to address the challenges in reconstructing high-resolution and high-reliability global ocean subsurface density. DDFNet employs multi-scale feature extraction, attention mechanisms, and a dual-label design, combining an encoder-decoder backbone network with a global spatial attention module to capture the complex spatiotemporal relationships in ocean data effectively. The model utilizes multisource surface remote sensing data as input and incorporates Argo profile data and ORAS5 reanalysis data as labels. An adaptive weighted loss function dynamically balances the contributions of the two label types, improving reconstruction accuracy and achieving a spatial resolution of 0.25°×0.25°. By constructing dual tasks with <em>in situ</em> observations and reanalysis data for joint learning, the true state of the ocean and the consistency of physical processes are enhanced, improving the model’s reconstruction accuracy and physical consistency. Experimental results demonstrate that DDFNet outperforms well-used LightGBM and CNN models, with the reconstructed DDFNet-SD dataset achieving an <em>R<sup>2</sup></em> of 0.9863 and an RMSE of 0.2804 kg/m<sup>3</sup>. The dataset further reveals a declining trend in global ocean subsurface density at a rate of −4.47 × 10<sup>-4</sup> kg/m<sup>3</sup>/decade, particularly pronounced in the upper 0–700 m, which is likely associated with global ocean warming and salinity changes. The high-resolution dataset facilitates studies on mesoscale ocean dynamics, stratification variability, and climate change impacts.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 158-179"},"PeriodicalIF":12.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SuperMapNet for long-range and high-accuracy vectorized HD map construction SuperMapNet用于远程和高精度矢量化高清地图构建
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-20 DOI: 10.1016/j.isprsjprs.2026.01.023
Ruqin Zhou , Chenguang Dai , Wanshou Jiang , Yongsheng Zhang , Zhenchao Zhang , San Jiang
Vectorized high-definition (HD) map construction is formulated as the task of classifying and localizing typical map elements based on features in a bird’s-eye view (BEV). This is essential for autonomous driving systems, providing interpretable environmental structured representations for decision and planning. Remarkable work has been achieved in recent years, but several major issues remain: (1) in the generation of the BEV features, single modality methods suffer from limited perception capability and range, while existing multi-modal fusion approaches underutilize cross-modal synergies and fail to resolve spatial disparities between modalities, resulting in misaligned BEV features with holes; (2) in the classification and localization of map elements, existing methods heavily rely on point-level modeling information while neglecting the information between elements and between point and element, leading to low accuracy with erroneous shapes and element entanglement. To address these limitations, we propose SuperMapNet, a multi-modal framework designed for long-range and high-accuracy vectorized HD map construction. This framework uses both camera images and LiDAR point clouds as input. It first tightly couples semantic information from camera images and geometric information from LiDAR point clouds by a cross-attention based synergy enhancement module and a flow-based disparity alignment module for long-range BEV feature generation. Subsequently, local information acquired by point queries and global information acquired by element queries are tightly coupled by three-level interactions for high-accuracy classification and localization, where Point2Point interaction captures local geometric consistency between points of the same element, Element2Element interaction learns global semantic relationships between elements, and Point2Element interaction complement element information for its constituent points. Experiments on the nuScenes and Argoverse2 datasets demonstrate high accuracy, surpassing previous state-of-the-art methods (SOTAs) by 14.9%/8.8% and 18.5%/3.1% mAP under the hard/easy settings, respectively, even over the double perception ranges (up to 120 m in the X-axis and 60 m in the Y-axis). The code is made publicly available at https://github.com/zhouruqin/SuperMapNet.
矢量化高清晰地图构建是一种基于鸟瞰图特征对典型地图元素进行分类和定位的任务。这对自动驾驶系统至关重要,为决策和规划提供可解释的环境结构表示。近年来,研究工作取得了显著成果,但仍存在以下几个主要问题:(1)在纯电动汽车特征的生成中,单模态方法的感知能力和范围有限,而现有的多模态融合方法未充分利用跨模态协同作用,未能解决模态间的空间差异,导致纯电动汽车特征与空穴对齐不一致;(2)在地图元素的分类和定位中,现有方法严重依赖点级建模信息,而忽略了元素之间和点与元素之间的信息,导致精度低,存在错误形状和元素纠缠。为了解决这些限制,我们提出了SuperMapNet,这是一个多模态框架,专为远程和高精度矢量化高清地图构建而设计。该框架使用相机图像和激光雷达点云作为输入。首先,通过基于交叉关注的协同增强模块和基于流的视差对准模块,将相机图像的语义信息与LiDAR点云的几何信息紧密耦合,用于远程BEV特征生成。随后,将点查询获取的局部信息与元素查询获取的全局信息通过三级交互紧密耦合,实现高精度的分类和定位,其中Point2Point交互捕获同一元素点之间的局部几何一致性,Element2Element交互学习元素之间的全局语义关系,Point2Element交互补充其组成点的元素信息。在nuScenes和Argoverse2数据集上的实验表明,即使在双感知范围内(x轴120米和y轴60米),在难/易设置下,也比以前的最先进的方法(SOTAs)分别高出14.9%/8.8%和18.5%/3.1%。该代码可在https://github.com/zhouruqin/SuperMapNet上公开获取。
{"title":"SuperMapNet for long-range and high-accuracy vectorized HD map construction","authors":"Ruqin Zhou ,&nbsp;Chenguang Dai ,&nbsp;Wanshou Jiang ,&nbsp;Yongsheng Zhang ,&nbsp;Zhenchao Zhang ,&nbsp;San Jiang","doi":"10.1016/j.isprsjprs.2026.01.023","DOIUrl":"10.1016/j.isprsjprs.2026.01.023","url":null,"abstract":"<div><div>Vectorized high-definition (HD) map construction is formulated as the task of classifying and localizing typical map elements based on features in a bird’s-eye view (BEV). This is essential for autonomous driving systems, providing interpretable environmental structured representations for decision and planning. Remarkable work has been achieved in recent years, but several major issues remain: (1) in the generation of the BEV features, single modality methods suffer from limited perception capability and range, while existing multi-modal fusion approaches underutilize cross-modal synergies and fail to resolve spatial disparities between modalities, resulting in misaligned BEV features with holes; (2) in the classification and localization of map elements, existing methods heavily rely on point-level modeling information while neglecting the information between elements and between point and element, leading to low accuracy with erroneous shapes and element entanglement. To address these limitations, we propose SuperMapNet, a multi-modal framework designed for long-range and high-accuracy vectorized HD map construction. This framework uses both camera images and LiDAR point clouds as input. It first tightly couples semantic information from camera images and geometric information from LiDAR point clouds by a cross-attention based synergy enhancement module and a flow-based disparity alignment module for long-range BEV feature generation. Subsequently, local information acquired by point queries and global information acquired by element queries are tightly coupled by three-level interactions for high-accuracy classification and localization, where Point2Point interaction captures local geometric consistency between points of the same element, Element2Element interaction learns global semantic relationships between elements, and Point2Element interaction complement element information for its constituent points. Experiments on the nuScenes and Argoverse2 datasets demonstrate high accuracy, surpassing previous state-of-the-art methods (SOTAs) by 14.9%/8.8% and 18.5%/3.1% mAP under the hard/easy settings, respectively, even over the double perception ranges (up to 120 <span><math><mi>m</mi></math></span> in the X-axis and 60 <span><math><mi>m</mi></math></span> in the Y-axis). The code is made publicly available at <span><span>https://github.com/zhouruqin/SuperMapNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 89-103"},"PeriodicalIF":12.2,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Roadside lidar-based scene understanding toward intelligent traffic perception: A comprehensive review 基于路边激光雷达的场景理解与智能交通感知:综述
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-20 DOI: 10.1016/j.isprsjprs.2026.01.012
Jiaxing Zhang , Chengjun Ge , Wen Xiao , Miao Tang , Jon Mills , Benjamin Coifman , Nengcheng Chen
Urban transportation systems are undergoing a paradigm shift with the integration of high-precision sensing technologies and intelligent perception frameworks. Roadside lidar, as a key enabler of infrastructure-based sensing technology, offers robust and precise 3D spatial understanding of dynamic urban scenes. This paper presents a comprehensive review of roadside lidar-based traffic perception, structured around five key modules: sensor placement strategies; multi-lidar point cloud fusion; dynamic traffic information extraction;subsequent applications including trajectory prediction, collision risk assessment, and behavioral analysis; representative roadside perception benchmark datasets. Despite notable progress, challenges remain in deployment optimization, robust registration under occlusion and dynamic conditions, generalizable object detection and tracking, and effective utilization of heterogeneous multi-modal data. Emerging trends point toward perception-driven infrastructure design, edge-cloud-terminal collaboration, and generalizable models enabled by domain adaptation, self-supervised learning, and foundation-scale datasets. This review aims to serve as a technical reference for researchers and practitioners, providing insights into current advances, open problems, and future directions in roadside lidar-based traffic perception and digital twin applications.
随着高精度传感技术和智能感知框架的融合,城市交通系统正在经历范式转变。路边激光雷达作为基于基础设施的传感技术的关键推动者,为动态城市场景提供强大而精确的3D空间理解。本文全面回顾了基于路边激光雷达的交通感知,围绕五个关键模块:传感器放置策略;多激光雷达点云融合;动态交通信息提取;后续应用包括轨迹预测、碰撞风险评估和行为分析;具有代表性的路边感知基准数据集。尽管取得了显著进展,但在部署优化、遮挡和动态条件下的鲁棒配准、可泛化的目标检测和跟踪以及异构多模态数据的有效利用等方面仍存在挑战。新兴趋势指向感知驱动的基础设施设计、边缘云终端协作以及由领域适应、自我监督学习和基础规模数据集支持的可推广模型。本文旨在为研究人员和实践者提供技术参考,提供基于路边激光雷达的交通感知和数字孪生应用的当前进展、开放问题和未来方向的见解。
{"title":"Roadside lidar-based scene understanding toward intelligent traffic perception: A comprehensive review","authors":"Jiaxing Zhang ,&nbsp;Chengjun Ge ,&nbsp;Wen Xiao ,&nbsp;Miao Tang ,&nbsp;Jon Mills ,&nbsp;Benjamin Coifman ,&nbsp;Nengcheng Chen","doi":"10.1016/j.isprsjprs.2026.01.012","DOIUrl":"10.1016/j.isprsjprs.2026.01.012","url":null,"abstract":"<div><div>Urban transportation systems are undergoing a paradigm shift with the integration of high-precision sensing technologies and intelligent perception frameworks. Roadside lidar, as a key enabler of infrastructure-based sensing technology, offers robust and precise 3D spatial understanding of dynamic urban scenes. This paper presents a comprehensive review of roadside lidar-based traffic perception, structured around five key modules: sensor placement strategies; multi-lidar point cloud fusion; dynamic traffic information extraction;subsequent applications including trajectory prediction, collision risk assessment, and behavioral analysis; representative roadside perception benchmark datasets. Despite notable progress, challenges remain in deployment optimization, robust registration under occlusion and dynamic conditions, generalizable object detection and tracking, and effective utilization of heterogeneous multi-modal data. Emerging trends point toward perception-driven infrastructure design, edge-cloud-terminal collaboration, and generalizable models enabled by domain adaptation, self-supervised learning, and foundation-scale datasets. This review aims to serve as a technical reference for researchers and practitioners, providing insights into current advances, open problems, and future directions in roadside lidar-based traffic perception and digital twin applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 69-88"},"PeriodicalIF":12.2,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering tree-scale monitoring over large areas: Individual tree delineation from high-resolution imagery 增强对大面积树木的监测能力:从高分辨率图像中描绘单个树木
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-20 DOI: 10.1016/j.isprsjprs.2025.12.022
Xinlian Liang , Yinrui Wang , Jun Pan , Janne Heiskanen , Ningning Wang , Siyu Wu , Ilja Vuorinne , Jiaojiao Tian , Jonas Troles , Myriam Cloutier , Stefano Puliti , Aishwarya Chandrasekaran , James Ball , Xiangcheng Mi , Guochun Shen , Kun Song , Guofan Shao , Rasmus Astrup , Yunsheng Wang , Petri Pellikka , Jianya Gong
Accurate individual tree delineation (ITD) is essential for forest monitoring, biodiversity assessment, and ecological modeling. While remote sensing (RS) has significantly advanced forest ITD, challenges persist, especially in complex forest environments. The use of imagery data is compelling given the rapid increase in available high-resolution aerial and satellite imagery data, the increasing need for image-based analysis where reliable 3D data are unavailable, the widening gap between data supply and processing capabilities, and the limited validation of state-of-the-art (SOTA) methods across diverse real-world conditions. This study aims to advance ITD research by evaluating SOTA instance segmentation approaches, including both recently developed and established methods. The analysis evaluates ITD algorithm performance using the largest forest instance-segmentation imagery dataset to date and standardized evaluation protocols. This study identifies key factors affecting accuracy, reveals remaining challenges, and outlines future research directions. Findings in this study reveal that ITD accuracy is heavily influenced by image resolution, forest structure, and method design. Findings also reveal that, while algorithm innovations remain important, robustness and transferability that ensure generalization across diverse environments are what differentiate method performances. In addition, this study highlights that commonly used evaluation metrics may fail to adequately capture precise performance in specific applications, e.g., individual-tree-crown segmentation in this study. Assessment reliability can be strengthened through the adoption of stricter criteria. Future research should focus on expanding datasets, refining evaluation protocols, and developing adaptive models capable of handling varying canopy structures. These advancements will enhance ITD scalability and reliability, contributing to more effective forest research and management at a global scale.
准确的单树圈定(ITD)对森林监测、生物多样性评估和生态建模至关重要。虽然遥感(RS)在森林过渡期间取得了显著进展,但挑战依然存在,特别是在复杂的森林环境中。鉴于可用的高分辨率航空和卫星图像数据的快速增长,对基于图像的分析的需求日益增加,而无法获得可靠的3D数据,数据供应和处理能力之间的差距不断扩大,以及在不同的现实世界条件下对最先进(SOTA)方法的有限验证,图像数据的使用是引人注目的。本研究旨在通过评估SOTA实例分割方法(包括最近开发的和已建立的方法)来推进ITD研究。该分析使用迄今为止最大的森林实例分割图像数据集和标准化评估协议来评估ITD算法的性能。本研究确定了影响准确性的关键因素,揭示了仍存在的挑战,并概述了未来的研究方向。研究结果表明,过渡段精度受图像分辨率、森林结构和方法设计的影响很大。研究结果还表明,虽然算法创新仍然很重要,但确保在不同环境下泛化的鲁棒性和可移植性是区分方法性能的关键。此外,本研究强调,常用的评估指标可能无法充分捕捉特定应用中的精确性能,例如本研究中的单个树冠分割。可以通过采用更严格的标准来加强评估的可靠性。未来的研究应集中在扩展数据集、改进评估方案和开发能够处理不同冠层结构的自适应模型上。这些进步将提高过渡段的可扩展性和可靠性,有助于在全球范围内更有效地进行森林研究和管理。
{"title":"Empowering tree-scale monitoring over large areas: Individual tree delineation from high-resolution imagery","authors":"Xinlian Liang ,&nbsp;Yinrui Wang ,&nbsp;Jun Pan ,&nbsp;Janne Heiskanen ,&nbsp;Ningning Wang ,&nbsp;Siyu Wu ,&nbsp;Ilja Vuorinne ,&nbsp;Jiaojiao Tian ,&nbsp;Jonas Troles ,&nbsp;Myriam Cloutier ,&nbsp;Stefano Puliti ,&nbsp;Aishwarya Chandrasekaran ,&nbsp;James Ball ,&nbsp;Xiangcheng Mi ,&nbsp;Guochun Shen ,&nbsp;Kun Song ,&nbsp;Guofan Shao ,&nbsp;Rasmus Astrup ,&nbsp;Yunsheng Wang ,&nbsp;Petri Pellikka ,&nbsp;Jianya Gong","doi":"10.1016/j.isprsjprs.2025.12.022","DOIUrl":"10.1016/j.isprsjprs.2025.12.022","url":null,"abstract":"<div><div>Accurate individual tree delineation (ITD) is essential for forest monitoring, biodiversity assessment, and ecological modeling. While remote sensing (RS) has significantly advanced forest ITD, challenges persist, especially in complex forest environments. The use of imagery data is compelling given the rapid increase in available high-resolution aerial and satellite imagery data, the increasing need for image-based analysis where reliable 3D data are unavailable, the widening gap between data supply and processing capabilities, and the limited validation of state-of-the-art (SOTA) methods across diverse real-world conditions. This study aims to advance ITD research by evaluating SOTA instance segmentation approaches, including both recently developed and established methods. The analysis evaluates ITD algorithm performance using the largest forest instance-segmentation imagery dataset to date and standardized evaluation protocols. This study identifies key factors affecting accuracy, reveals remaining challenges, and outlines future research directions. Findings in this study reveal that ITD accuracy is heavily influenced by image resolution, forest structure, and method design. Findings also reveal that, while algorithm innovations remain important, robustness and transferability that ensure generalization across diverse environments are what differentiate method performances. In addition, this study highlights that commonly used evaluation metrics may fail to adequately capture precise performance in specific applications, e.g., individual-tree-crown segmentation in this study. Assessment reliability can be strengthened through the adoption of stricter criteria. Future research should focus on expanding datasets, refining evaluation protocols, and developing adaptive models capable of handling varying canopy structures. These advancements will enhance ITD scalability and reliability, contributing to more effective forest research and management at a global scale.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 974-999"},"PeriodicalIF":12.2,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VectorLLM: Human-like extraction of structured building contours via multimodal LLMs VectorLLM:通过多模态llm提取结构化建筑轮廓
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.025
Tao Zhang , Shiqing Wei , Shihao Chen , Wenling Yu , Muying Luo , Shunping Ji
Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at https://github.com/zhang-tao-whu/VectorLLM.
从遥感影像中自动提取矢量化建筑轮廓对于城市规划、人口估计和灾害评估至关重要。目前最先进的方法依赖于复杂的多阶段管道,包括像素分割,矢量化和多边形细化,这限制了它们的可扩展性和现实世界的适用性。受大型语言模型(llm)卓越推理能力的启发,我们引入了VectorLLM,这是第一个设计用于从遥感图像中提取常规建筑物轮廓的多模态大型语言模型(MLLM)。与现有方法不同,VectorLLM直接对建筑轮廓进行逐角回归,模仿人类注释者的标记过程。我们的架构包括一个视觉基础主干、一个MLP连接器和一个LLM,通过可学习的位置嵌入来增强空间理解能力。通过对训练策略的全面探索,包括预训练、监督微调和跨WHU、WHU- mix和CrowdAI数据集的直接偏好优化,VectorLLM优于以前的SOTA方法。值得注意的是,VectorLLM在飞机、水体和油箱等看不见的物体上表现出强大的零射击性能,突出了其在各种遥感物体轮廓提取任务的统一建模方面的潜力。总的来说,这项工作为遥感矢量提取建立了一个新的范例,利用llm的拓扑推理能力来实现高精度和卓越的泛化。所有代码和权重都可以在https://github.com/zhang-tao-whu/VectorLLM上获得。
{"title":"VectorLLM: Human-like extraction of structured building contours via multimodal LLMs","authors":"Tao Zhang ,&nbsp;Shiqing Wei ,&nbsp;Shihao Chen ,&nbsp;Wenling Yu ,&nbsp;Muying Luo ,&nbsp;Shunping Ji","doi":"10.1016/j.isprsjprs.2026.01.025","DOIUrl":"10.1016/j.isprsjprs.2026.01.025","url":null,"abstract":"<div><div>Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines involving pixel segmentation, vectorization, and polygon refinement, which limits their scalability and real-world applicability. Inspired by the remarkable reasoning capabilities of Large Language Models (LLMs), we introduce VectorLLM, the first Multi-modal Large Language Model (MLLM) designed for regular building contour extraction from remote sensing images. Unlike existing approaches, VectorLLM performs corner-point by corner-point regression of building contours directly, mimicking human annotators’ labeling process. Our architecture consists of a vision foundation backbone, an MLP connector, and an LLM, enhanced with learnable position embeddings to improve spatial understanding capability. Through comprehensive exploration of training strategies including pretraining, supervised fine-tuning, and direct preference optimization across WHU, WHU-Mix, and CrowdAI datasets, VectorLLM outperforms the previous SOTA methods. Remarkably, VectorLLM exhibits strong zero-shot performance on unseen objects including aircraft, water bodies, and oil tanks, highlighting its potential for unified modeling of diverse remote sensing object contour extraction tasks. Overall, this work establishes a new paradigm for vector extraction in remote sensing, leveraging the topological reasoning capabilities of LLMs to achieve both high accuracy and exceptional generalization. All code and weights will be available at <span><span>https://github.com/zhang-tao-whu/VectorLLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 55-68"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMS-Former: Adaptive multi-scale transformer for multi-modal image matching AMS-Former:用于多模态图像匹配的自适应多尺度变压器
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.021
Jiahao Rao , Rui Liu , Jianjun Guan , Xin Tian
Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: https://github.com/Henryrjh/AMS_Former.
多模态图像匹配在多源图像信息融合中起着至关重要的作用。然而,由于MMI在几何和模态上的显著差异,现有的方法往往不能达到令人满意的匹配性能。为了解决这些挑战,我们提出了一种端到端的MMI匹配方法,称为自适应多尺度变压器(AMS-Former)。首先,AMS-Former构建多尺度图像匹配框架,整合不同尺度的上下文信息,有效识别潜在的对应点,从而提高匹配精度。为了解决模态差异带来的挑战,我们设计了一个具有自适应调制策略的跨模态特征提取模块。该模块有效耦合了不同模态的特征,增强了特征表征,提高了模型在复杂模态差异下的鲁棒性。为了进一步提高匹配性能,我们为所提出的AMS-Former设计了合适的损失函数来指导网络参数的优化。最后,采用跨尺度互监督策略去除不正确的对应点,提高匹配结果的可靠性。在5个MMI数据集上进行的大量实验表明,AMS-Former优于最先进的方法,包括RIFT、ASS、comom、POS-GIFT、Matchformer、SEMLA、TopicFM和Lightglue。我们的代码可在:https://github.com/Henryrjh/AMS_Former。
{"title":"AMS-Former: Adaptive multi-scale transformer for multi-modal image matching","authors":"Jiahao Rao ,&nbsp;Rui Liu ,&nbsp;Jianjun Guan ,&nbsp;Xin Tian","doi":"10.1016/j.isprsjprs.2026.01.021","DOIUrl":"10.1016/j.isprsjprs.2026.01.021","url":null,"abstract":"<div><div>Multi-modal image (MMI) matching plays a crucial role in the fusion of multi-source image information. However, due to the significant geometric and modality differences in MMI, existing methods often fail to achieve satisfactory matching performance. To address these challenges, we propose an end-to-end MMI matching approach, named adaptive multi-scale transformer (AMS-Former). First, AMS-Former constructs a multi-scale image matching framework that integrates contextual information across different scales, effectively identifying potential corresponding points and thereby improving matching accuracy. To handle the challenges caused by modality differences, we design a cross-modal feature extraction module with an adaptive modulation strategy. This module effectively couples features from different modalities, enhancing feature representation and improving model robustness under complex modality differences. To further enhance matching performance, we design a suitable loss function for the proposed AMS-Former to guide the optimization of network parameters. Finally, we use a cross-scale mutual supervision strategy to remove incorrect corresponding points and enhance the reliability of the matching results. Extensive experiments on five MMI datasets demonstrate that AMS-Former outperforms state-of-the-art methods, including RIFT, ASS, COFSM, POS-GIFT, Matchformer, SEMLA, TopicFM, and Lightglue. Our code is available at: <span><span>https://github.com/Henryrjh/AMS_Former</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 957-973"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WEGLA-NormGAN: wavelet-enhanced Cycle-GAN with global-local attention for radiometric normalization of remote sensing images WEGLA-NormGAN:基于全局-局部关注的遥感图像辐射归一化小波增强循环gan
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-19 DOI: 10.1016/j.isprsjprs.2026.01.020
Wenxia Gan , Yu Feng , Jianhao Miao , Xinghua Li , Huanfeng Shen
The diversity of satellite remote sensing images has significantly enhanced the capability to observe surface information on Earth. However, multi-temporal optical remote sensing images acquired from different sensor platforms often exhibit substantial radiometric discrepancies, and it is difficult to obtain overlapping reference images, which poses critical challenges for seamless large-scale mosaicking, including global radiometric inconsistency, unsmooth local transitions, and visible seamlines. Existing traditional and deep learning methods can achieve reasonable performance on paired datasets, but often face challenges in balancing spatial structural integrity with enhanced radiometric consistency and generalizing to unseen images. To address these issues, a wavelet-enhanced radiometric normalization network called WEGLA-NormGAN is proposed to generate radiometrically normalized imagery with sound radiometric consistency and spatial fidelity. This framework integrates frequency-domain and spatial-domain information to achieve consistent multi-scale radiometric feature modeling while ensuring spatial structural fidelity. Firstly, wavelet transform is introduced to effectively decouple radiometric information and structural features from images, explicitly enhancing radiometric feature representation and edge-texture preservation. Secondly, a U-Net architecture with multi-scale modeling advantages is fused with an adaptive attention mechanism incorporating residual structures. This hybrid design employs a statistical alignment strategy to efficiently extract global shallow features and local statistical information, adaptively adjust the dynamic attention of unseen data, and alleviate local distortions, improving radiometric consistency and achieving high-fidelity spatial structure preservation. The proposed framework generates radiometrically normalized imagery that harmonizes radiometric consistency with spatial fidelity, while achieving outstanding radiometric normalization even in unseen scenarios. Extensive experiments were conducted on two public datasets and a self-constructed dataset. The results demonstrate that WEGLA-NormGAN outperforms seven state-of-the-art methods in cross-temporal scenarios and five in cross-spatiotemporal scenarios in terms of radiometric consistency, structural fidelity, and robustness. The code is available at https://github.com/WITRS/WeGLA-Norm.git.
卫星遥感影像的多样性大大提高了观测地球表面信息的能力。然而,从不同传感器平台获取的多时相光学遥感图像往往存在较大的辐射差异,并且难以获得重叠的参考图像,这对无缝大规模拼接提出了严峻的挑战,包括全局辐射不一致、局部过渡不平滑和接缝可见。现有的传统和深度学习方法可以在配对数据集上获得合理的性能,但在平衡空间结构完整性与增强辐射一致性以及推广到未见图像方面往往面临挑战。为了解决这些问题,提出了一种名为WEGLA-NormGAN的小波增强辐射归一化网络,以生成具有声音辐射一致性和空间保真度的辐射归一化图像。该框架集成了频域和空域信息,在保证空间结构保真度的同时实现一致的多尺度辐射特征建模。首先,引入小波变换,有效解耦图像中的辐射特征信息和结构特征,明显增强辐射特征表示和边缘纹理保持;其次,将具有多尺度建模优势的U-Net体系结构与包含残余结构的自适应注意机制相融合;该混合设计采用统计对齐策略,有效提取全局浅特征和局部统计信息,自适应调整未见数据的动态注意力,减轻局部失真,提高辐射一致性,实现高保真的空间结构保存。所提出的框架生成辐射标准化图像,使辐射一致性与空间保真度相协调,同时即使在看不见的场景中也能实现出色的辐射标准化。在两个公共数据集和一个自建数据集上进行了大量的实验。结果表明,在辐射一致性、结构保真度和鲁棒性方面,WEGLA-NormGAN在跨时间情景下优于7种最先进的方法,在跨时空情景下优于5种最先进的方法。代码可在https://github.com/WITRS/WeGLA-Norm.git上获得。
{"title":"WEGLA-NormGAN: wavelet-enhanced Cycle-GAN with global-local attention for radiometric normalization of remote sensing images","authors":"Wenxia Gan ,&nbsp;Yu Feng ,&nbsp;Jianhao Miao ,&nbsp;Xinghua Li ,&nbsp;Huanfeng Shen","doi":"10.1016/j.isprsjprs.2026.01.020","DOIUrl":"10.1016/j.isprsjprs.2026.01.020","url":null,"abstract":"<div><div>The diversity of satellite remote sensing images has significantly enhanced the capability to observe surface information on Earth. However, multi-temporal optical remote sensing images acquired from different sensor platforms often exhibit substantial radiometric discrepancies, and it is difficult to obtain overlapping reference images, which poses critical challenges for seamless large-scale mosaicking, including global radiometric inconsistency, unsmooth local transitions, and visible seamlines. Existing traditional and deep learning methods can achieve reasonable performance on paired datasets, but often face challenges in balancing spatial structural integrity with enhanced radiometric consistency and generalizing to unseen images. To address these issues, a wavelet-enhanced radiometric normalization network called WEGLA-NormGAN is proposed to generate radiometrically normalized imagery with sound radiometric consistency and spatial fidelity. This framework integrates frequency-domain and spatial-domain information to achieve consistent multi-scale radiometric feature modeling while ensuring spatial structural fidelity. Firstly, wavelet transform is introduced to effectively decouple radiometric information and structural features from images, explicitly enhancing radiometric feature representation and edge-texture preservation. Secondly, a U-Net architecture with multi-scale modeling advantages is fused with an adaptive attention mechanism incorporating residual structures. This hybrid design employs a statistical alignment strategy to efficiently extract global shallow features and local statistical information, adaptively adjust the dynamic attention of unseen data, and alleviate local distortions, improving radiometric consistency and achieving high-fidelity spatial structure preservation. The proposed framework generates radiometrically normalized imagery that harmonizes radiometric consistency with spatial fidelity, while achieving outstanding radiometric normalization even in unseen scenarios. Extensive experiments were conducted on two public datasets and a self-constructed dataset. The results demonstrate that WEGLA-NormGAN outperforms seven state-of-the-art methods in cross-temporal scenarios and five in cross-spatiotemporal scenarios in terms of radiometric consistency, structural fidelity, and robustness. The code is available at <span><span>https://github.com/WITRS/WeGLA-Norm.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"233 ","pages":"Pages 39-54"},"PeriodicalIF":12.2,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attributing GHG emissions to individual facilities using multi-temporal hyperspectral images: Methodology and applications 利用多时相高光谱图像将温室气体排放归因于单个设施:方法和应用
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-17 DOI: 10.1016/j.isprsjprs.2026.01.014
Yichi Zhang , Ge Han , Yiyang Huang , Huayi Wang , Hongyuan Zhang , Zhipeng Pei , Yuanxue Pu , Haotian Luo , Jinchun Yi , Tianqi Shi , Siwei Li , Wei Gong
Industrial parks are major sources of greenhouse gas (GHG) emissions and the ultimate entities responsible for implementing mitigation policies. Current satellite remote sensing technologies perform well in reporting localized strong point-source emissions, but face significant challenges in monitoring emissions from multiple densely clustered sources. To address the limitation, we propose an emission allocation framework, EA-MILES, which integrates multi-source hyperspectral data with plume modeling to quantify process-level emissions. Simulation experiments show that with existing hyperspectral satellites, EA-MILES can estimate emissions for sources with intensities above 80 t CO2/h and 100 kg CH4/h with bias not exceed 13.60 % and 17.08 %. A steel and power production park is selected as a case study, where EA-MILES estimates process-level emissions with uncertainties ranging from 26.33 % to 37.78 %. Estimation results are consistent with inventory values derived from emission factor methods. Top-down Integrated Mass Enhancement method is utilized to compare with EA-MILES results, the estimation bias did not exceed 16.84 %. According to the Climate TRACE, about 32 % of CO2 and 44 % of CH4 point-sources worldwide fall within EA-MILES detection coverage, accounting for over 80 % and 55 % of anthropogenic CO2 and CH4 emissions. Therefore, this study provides a novel satellite-based approach for reporting facility-scale GHG emissions in industrial parks, offering transparent and accurate monitoring data to support the mitigation and energy transition decision-making.
工业园区是温室气体(GHG)排放的主要来源,也是负责实施减缓政策的最终实体。目前的卫星遥感技术在报告局部强点源排放方面表现良好,但在监测多个密集聚集源的排放方面面临重大挑战。为了解决这一限制,我们提出了一个排放分配框架EA-MILES,该框架将多源高光谱数据与羽流建模相结合,以量化过程级排放。模拟实验表明,利用现有的高光谱卫星,EA-MILES可以估算强度在80 t CO2/h和100 kg CH4/h以上的源的排放量,偏差不超过13.60%和17.08%。以某钢铁和电力生产园区为例,EA-MILES估算的过程级排放不确定性在26.33% ~ 37.78%之间。估算结果与排放因子法得出的库存值一致。采用自顶向下集成质量增强方法与EA-MILES结果进行比较,估计偏差不超过16.84%。根据Climate TRACE,全球约32%的CO2和44%的CH4点源在EA-MILES检测范围内,占人为CO2和CH4排放量的80%和55%以上。因此,本研究提供了一种新的基于卫星的方法来报告工业园区设施规模的温室气体排放,提供透明和准确的监测数据,以支持减缓和能源转型决策。
{"title":"Attributing GHG emissions to individual facilities using multi-temporal hyperspectral images: Methodology and applications","authors":"Yichi Zhang ,&nbsp;Ge Han ,&nbsp;Yiyang Huang ,&nbsp;Huayi Wang ,&nbsp;Hongyuan Zhang ,&nbsp;Zhipeng Pei ,&nbsp;Yuanxue Pu ,&nbsp;Haotian Luo ,&nbsp;Jinchun Yi ,&nbsp;Tianqi Shi ,&nbsp;Siwei Li ,&nbsp;Wei Gong","doi":"10.1016/j.isprsjprs.2026.01.014","DOIUrl":"10.1016/j.isprsjprs.2026.01.014","url":null,"abstract":"<div><div>Industrial parks are major sources of greenhouse gas (GHG) emissions and the ultimate entities responsible for implementing mitigation policies. Current satellite remote sensing technologies perform well in reporting localized strong point-source emissions, but face significant challenges in monitoring emissions from multiple densely clustered sources. To address the limitation, we propose an emission allocation framework, EA-MILES, which integrates multi-source hyperspectral data with plume modeling to quantify process-level emissions. Simulation experiments show that with existing hyperspectral satellites, EA-MILES can estimate emissions for sources with intensities above 80 t CO<sub>2</sub>/h and 100 kg CH<sub>4</sub>/h with bias not exceed 13.60 % and 17.08 %. A steel and power production park is selected as a case study, where EA-MILES estimates process-level emissions with uncertainties ranging from 26.33 % to 37.78 %. Estimation results are consistent with inventory values derived from emission factor methods. Top-down Integrated Mass Enhancement method is utilized to compare with EA-MILES results, the estimation bias did not exceed 16.84 %. According to the <em>Climate TRACE</em>, about 32 % of CO<sub>2</sub> and 44 % of CH<sub>4</sub> point-sources worldwide fall within EA-MILES detection coverage, accounting for over 80 % and 55 % of anthropogenic CO<sub>2</sub> and CH<sub>4</sub> emissions. Therefore, this study provides a novel satellite-based approach for reporting facility-scale GHG emissions in industrial parks, offering transparent and accurate monitoring data to support the mitigation and energy transition decision-making.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 937-956"},"PeriodicalIF":12.2,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BEDI: a comprehensive benchmark for evaluating embodied agents on UAVs BEDI:评估无人机上嵌入代理的综合基准
IF 12.2 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2026-01-16 DOI: 10.1016/j.isprsjprs.2026.01.013
Mingning Guo , Mengwei Wu , Jiarun He, Shaoxian Li, Haifeng Li, Chao Tao
With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.
随着低空遥感技术和视觉语言模型的快速发展,基于无人机的具身智能体在自主任务中显示出巨大的潜力。然而,目前UAV-Embodied Agents (uav - ea)的评估方法仍然受到缺乏标准化基准、多样化测试场景和开放系统接口的限制。为了应对这些挑战,我们提出了BEDI(嵌入式无人机智能基准),这是一个用于评估无人机- ea的系统和标准化基准。具体来说,我们引入了一种新的基于感知-决策-行动循环的动态体现任务链范式,该范式将复杂的无人机任务分解为标准化、可测量的子任务。在此基础上,我们设计了一个统一的评估框架,包括六个核心子技能:语义感知、空间感知、运动控制、工具利用、任务规划和动作生成。此外,我们开发了一个混合测试平台,该平台结合了广泛的虚拟和现实场景,能够在不同环境下对uav - ea进行全面评估。该平台还提供开放和标准化的接口,允许研究人员自定义任务和扩展场景,从而增强评估过程的灵活性和可扩展性。最后,通过对几个最先进的(SOTA) vlm的实证评估,我们揭示了它们在具体无人机任务中的局限性,强调了BEDI基准在推进具体智能研究和模型优化方面的关键作用。BEDI填补了该领域在系统化、规范化评价方面的空白,有利于客观的模型比较,为该领域的未来发展奠定坚实的基础。我们的基准现在可以在https://github.com/lostwolves/BEDI上公开获得。
{"title":"BEDI: a comprehensive benchmark for evaluating embodied agents on UAVs","authors":"Mingning Guo ,&nbsp;Mengwei Wu ,&nbsp;Jiarun He,&nbsp;Shaoxian Li,&nbsp;Haifeng Li,&nbsp;Chao Tao","doi":"10.1016/j.isprsjprs.2026.01.013","DOIUrl":"10.1016/j.isprsjprs.2026.01.013","url":null,"abstract":"<div><div>With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that<!--> <!-->incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at <span><span>https://github.com/lostwolves/BEDI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 910-936"},"PeriodicalIF":12.2,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1