Pub Date : 2025-11-19DOI: 10.1016/j.isprsjprs.2025.11.004
Pengwei Zhou , Hongche Yin , Guozheng Xu , Xiaosong Wei , Annan Zhou , Jian Yao , Li Li , Huang Jing
<div><div>The underlying view-graph is generally constructed through the matching of unordered image paris, which establish a crucial step in ensuring the accuracy and efficiency of the structure-from-motion (SfM) process. However, the initial graph often contain redundant and erroneous edges, which arise from incorrect image retrieval and ambiguous structures(e.g., symmetric buildings with identical or opposing facets), leading to the emergence of ghosting effects and superimposed reconstruction artifacts. Most contemporary approaches employ bespoke solutions to attain specific reconstruction goals, such as efficiency, precision, or disambiguation. In contrast to these task-specific methods, we propose a probabilistic graph-theoretic framework, termed PGVS, which formulates the view-graph selection problem as a weighted maximum clique optimization problem, achieving both sparsification and disambiguation simultaneously. Furthermore, we develop a sophisticated binary penalty continuous relaxation technique to derive a solution that is guaranteed to correspond to the optimal outcome of the original problem. In contrast to techniques for verifying pose consistency, we introduce a context-aware graph similarity assessment mechanism that is based on view triplets with a multi-view patch tracking strategy. This approach helps alleviate the effects of vanishing keypoints and environmental occlusions and reduces the impact of erroneous image correspondences that often undermine the reliability of pose estimation. Moreover, we develop a Bayesian inference framework to evaluate edge-level consistency analysis over the context graph, enabling us to estimate the likelihood that each edge reflects a globally coherent match. This probabilistic characterization is then leveraged to construct the adjacency matrix for a weighted maximum clique formulation. To solve this combinatorial problem, we employ a continuous binary-penalty relaxation technique, which enables us to obtain an optimal solution reflecting global consistency with the highest matching affinity and confidence. The resulting selected view-graph constitutes a novel and efficient algorithmic component that can be seamlessly integrated as a preprocessing module into any SfM pipeline, thereby enhancing its adaptability and general applicability. We validate the efficacy of our method on both generic and ambiguous datasets, which cover a wide spectrum of small, medium, and large-scale datasets, each exhibiting distinct statistical characteristics. In generic datasets, our approach significantly reduces reconstruction time by removing redundant edges to sparsify the view-graph while preserving accuracy and mitigating ghosting artifacts. For ambiguous datasets, our method excels in identifying erroneous matches, even under highly challenging conditions, leading to accurate, disambiguated, and unsuperimposed 3D reconstructions. The source code of our approach is publicly available at <span><span>https://
{"title":"PGVS: A probabilistic graph-theoretic framework for view-graph selection in structure-from-motion","authors":"Pengwei Zhou , Hongche Yin , Guozheng Xu , Xiaosong Wei , Annan Zhou , Jian Yao , Li Li , Huang Jing","doi":"10.1016/j.isprsjprs.2025.11.004","DOIUrl":"10.1016/j.isprsjprs.2025.11.004","url":null,"abstract":"<div><div>The underlying view-graph is generally constructed through the matching of unordered image paris, which establish a crucial step in ensuring the accuracy and efficiency of the structure-from-motion (SfM) process. However, the initial graph often contain redundant and erroneous edges, which arise from incorrect image retrieval and ambiguous structures(e.g., symmetric buildings with identical or opposing facets), leading to the emergence of ghosting effects and superimposed reconstruction artifacts. Most contemporary approaches employ bespoke solutions to attain specific reconstruction goals, such as efficiency, precision, or disambiguation. In contrast to these task-specific methods, we propose a probabilistic graph-theoretic framework, termed PGVS, which formulates the view-graph selection problem as a weighted maximum clique optimization problem, achieving both sparsification and disambiguation simultaneously. Furthermore, we develop a sophisticated binary penalty continuous relaxation technique to derive a solution that is guaranteed to correspond to the optimal outcome of the original problem. In contrast to techniques for verifying pose consistency, we introduce a context-aware graph similarity assessment mechanism that is based on view triplets with a multi-view patch tracking strategy. This approach helps alleviate the effects of vanishing keypoints and environmental occlusions and reduces the impact of erroneous image correspondences that often undermine the reliability of pose estimation. Moreover, we develop a Bayesian inference framework to evaluate edge-level consistency analysis over the context graph, enabling us to estimate the likelihood that each edge reflects a globally coherent match. This probabilistic characterization is then leveraged to construct the adjacency matrix for a weighted maximum clique formulation. To solve this combinatorial problem, we employ a continuous binary-penalty relaxation technique, which enables us to obtain an optimal solution reflecting global consistency with the highest matching affinity and confidence. The resulting selected view-graph constitutes a novel and efficient algorithmic component that can be seamlessly integrated as a preprocessing module into any SfM pipeline, thereby enhancing its adaptability and general applicability. We validate the efficacy of our method on both generic and ambiguous datasets, which cover a wide spectrum of small, medium, and large-scale datasets, each exhibiting distinct statistical characteristics. In generic datasets, our approach significantly reduces reconstruction time by removing redundant edges to sparsify the view-graph while preserving accuracy and mitigating ghosting artifacts. For ambiguous datasets, our method excels in identifying erroneous matches, even under highly challenging conditions, leading to accurate, disambiguated, and unsuperimposed 3D reconstructions. The source code of our approach is publicly available at <span><span>https://","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 641-663"},"PeriodicalIF":12.2,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1016/j.isprsjprs.2025.11.014
Nan Jiang , Hai-bo Li , Cong-jiang Li , Yu-xiang Hu , Jia-wen Zhou
Co-registration of UAV photogrammetry data has emerged as a vital technique for aligning multi-temporal aerial datasets due to its operational simplicity and low cost. However, its persistently low geolocation accuracy, resulting from the unavailability of real ground control points (GCPs), remains challenging. To resolve this limitation, we identify and quantify the “Error-Correction Effect (ECE)” in bundle adjustment (BA) and propose a novel method leveraging this phenomenon to estimate horizontal and vertical geographic coordinates of assumed control points (ACPs), significantly enhancing geolocation accuracy in multi-temporal UAV co-registration. Through phenomenological description and formula derivation, we first conduct theoretical analysis of ECE to elucidate its formation mechanism. Field experiments and sensitivity analysis then clarify ECE’s triggering conditions and quantify its characteristics. Subsequently, we develop ECE-based solutions for deriving horizontal and vertical coordinates of ACPs, with parametric sensitivity and computational accuracy validated through field-measured data. Results demonstrate that accuracy primarily correlates with distance from ACPs to the controlled areas. At operational distances of 650–900 m, mean horizontal coordinate errors range from 8-13 mm, while vertical errors range from 10-60 mm. This represents a substantial improvement over conventional co-registration, which exhibits mean errors of 214 mm (horizontal) and 295 mm (vertical) under identical conditions.
Our code is available at https://github.com/meowxu/ECE-fitting.
{"title":"Bundle adjustment-based co-registration with high geolocation accuracy for UAV photogrammetry","authors":"Nan Jiang , Hai-bo Li , Cong-jiang Li , Yu-xiang Hu , Jia-wen Zhou","doi":"10.1016/j.isprsjprs.2025.11.014","DOIUrl":"10.1016/j.isprsjprs.2025.11.014","url":null,"abstract":"<div><div>Co-registration of UAV photogrammetry data has emerged as a vital technique for aligning multi-temporal aerial datasets due to its operational simplicity and low cost. However, its persistently low geolocation accuracy, resulting from the unavailability of real ground control points (GCPs), remains challenging. To resolve this limitation, we identify and quantify the “Error-Correction Effect (ECE)” in bundle adjustment (BA) and propose a novel method leveraging this phenomenon to estimate horizontal and vertical geographic coordinates of assumed control points (ACPs), significantly enhancing geolocation accuracy in multi-temporal UAV co-registration. Through phenomenological description and formula derivation, we first conduct theoretical analysis of ECE to elucidate its formation mechanism. Field experiments and sensitivity analysis then clarify ECE’s triggering conditions and quantify its characteristics. Subsequently, we develop ECE-based solutions for deriving horizontal and vertical coordinates of ACPs, with parametric sensitivity and computational accuracy validated through field-measured data. Results demonstrate that accuracy primarily correlates with distance from ACPs to the controlled areas. At operational distances of 650–900 m, mean horizontal coordinate errors range from 8-13 mm, while vertical errors range from 10-60 mm. This represents a substantial improvement over conventional co-registration, which exhibits mean errors of 214 mm (horizontal) and 295 mm (vertical) under identical conditions.</div><div>Our code is available at <span><span>https://github.com/meowxu/ECE-fitting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 622-640"},"PeriodicalIF":12.2,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145535926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1016/j.isprsjprs.2025.11.013
Kaiwen Wang , Yue Pan , Federico Magistri , Lammert Kooistra , Cyrill Stachniss , Wensheng Wang , João Valente
Accurate fruit shape reconstruction under real-world field conditions is essential for high-throughput phenotyping, sensor-based yield estimation, and orchard management. Existing approaches based on 2D imaging or explicit 3D reconstruction often suffer from occlusions, sparse views, and complex scene dynamics as a result of the plant geometries. This paper presents a novel UAV-based monocular 3D panoptic mapping framework for robust and scalable fruit shape completion in orchards. The proposed method integrates (1) Grounded-SAM2 for multi-object tracking and segmentation (MOTS), (2) photogrammetric structure-from-motion for 3D scene reconstruction, and (3) DeepSDF, an implicit neural representation, for completing occluded fruit geometries with a neural network. We furthermore propose a new MOTS evaluation protocol to assess tracking performance without requiring ground truth annotations. Experiments conducted in both controlled laboratory conditions and an operational apple orchard demonstrate the accuracy of our 3D fruit reconstruction at the centimeter level. The Chamfer distance error of the proposed shape completion method using the DeepSDF shape prior reduces this to the millimeter level, and outperforms the traditional method, while Grounded-SAM2 enables robust fruit tracking across challenging viewpoints. The approach is highly scalable and applicable to real-world agricultural scenarios, offering a promising solution to reconstruct complete fruits with visibility higher than 10% for precise 3D fruit phenotyping at a large scale under occluded conditions.
{"title":"UAV-based monocular 3D panoptic mapping for fruit shape completion in orchard","authors":"Kaiwen Wang , Yue Pan , Federico Magistri , Lammert Kooistra , Cyrill Stachniss , Wensheng Wang , João Valente","doi":"10.1016/j.isprsjprs.2025.11.013","DOIUrl":"10.1016/j.isprsjprs.2025.11.013","url":null,"abstract":"<div><div>Accurate fruit shape reconstruction under real-world field conditions is essential for high-throughput phenotyping, sensor-based yield estimation, and orchard management. Existing approaches based on 2D imaging or explicit 3D reconstruction often suffer from occlusions, sparse views, and complex scene dynamics as a result of the plant geometries. This paper presents a novel UAV-based monocular 3D panoptic mapping framework for robust and scalable fruit shape completion in orchards. The proposed method integrates (1) Grounded-SAM2 for multi-object tracking and segmentation (MOTS), (2) photogrammetric structure-from-motion for 3D scene reconstruction, and (3) DeepSDF, an implicit neural representation, for completing occluded fruit geometries with a neural network. We furthermore propose a new MOTS evaluation protocol to assess tracking performance without requiring ground truth annotations. Experiments conducted in both controlled laboratory conditions and an operational apple orchard demonstrate the accuracy of our 3D fruit reconstruction at the centimeter level. The Chamfer distance error of the proposed shape completion method using the DeepSDF shape prior reduces this to the millimeter level, and outperforms the traditional method, while Grounded-SAM2 enables robust fruit tracking across challenging viewpoints. The approach is highly scalable and applicable to real-world agricultural scenarios, offering a promising solution to reconstruct complete fruits with visibility higher than 10% for precise 3D fruit phenotyping at a large scale under occluded conditions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 608-621"},"PeriodicalIF":12.2,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.isprsjprs.2025.11.008
Yuanshuo Hao , Timo Pukkala , Xin Liu , Ying Quan , Lihu Dong , Fengri Li
Forest biomass mapping, monitoring and verification are vital to understand carbon cycling and mitigate climate change. Nevertheless, a remote sensing–based framework that upscales biomass estimation from the individual-tree level remains underdeveloped, especially for rigorously quantifying the propagation of associated uncertainties. This study proposed an upscaling framework for aboveground biomass (AGB) mapping and prediction uncertainty estimation using unmanned aerial vehicle laser scanning (UAVLS) data. Tree-level metrics were first generated from delineated point clouds. The per-tree AGB was predicted with a developed allometric equation and then upscaled for coarser-scale prediction. A case study was conducted on larch (Larix olgensis) forests in Northeast China. An AGB allometric equation for remote sensing application purposes was established, adopting the crown width (CW) and tree height (H) as predictors, based on 147 destructive tree samples. Tree- and plot-level (30 m × 30 m) AGB values were estimated with root-mean-squared differences (RMSDs) of 33.84 % and 10.74 %, respectively, relative to field-based AGB estimates, and the prediction accuracy improved as the estimation was aggregated from the tree to the plot scale. Furthermore, this study introduced an analytical framework to characterize the AGB prediction uncertainty considering error propagation throughout the whole upscaling workflow. At the tree level, the total uncertainty in AGB prediction was approximately 38.11 %. The errors associated with the UAVLS-measured CW and H contributed the most to the total uncertainty, at approximately 61.60 %, followed by allometry residual errors, which contributed approximately 37.71 % to the total uncertainty, while model parameters only contributed approximately 0.69 % to the total uncertainty. The per-plot error accounted for approximately 11.67 % of the estimated AGB, of which omission, commission, and aggregated tree-level errors accounted for approximately 78.97 %, 7.27 %, and 13.76 %, respectively, of the total variance and generally decreased for the plots with higher AGB. A simulation experiment revealed that the aggregated tree-level errors decreased the most with the spatial resolution over the other errors. This study not only contributes to the upscaling of AGB estimation using UAVLS data but also provides an intuitive framework for quantifying the associated uncertainties.
森林生物量制图、监测和验证对于了解碳循环和减缓气候变化至关重要。然而,从单个树的水平提高生物量估算的基于遥感的框架仍然不发达,特别是对相关不确定性传播的严格量化。本文提出了利用无人机激光扫描(UAVLS)数据进行地上生物量(AGB)制图和预测不确定性估算的升级框架。树级度量首先由划定的点云生成。利用发展的异速生长方程预测每棵树的AGB,然后进行更大尺度的预测。以东北地区落叶松(Larix olgensis)为例进行了研究。以147个破坏性树木样本为基础,以冠宽(CW)和树高(H)为预测因子,建立了遥感应用的AGB异速生长方程。相对于基于田间的AGB估计值,树级和样地级(30 m × 30 m) AGB估计值的均方根差(rmsd)分别为33.84%和10.74%,并且随着估计值从树级到样地尺度的汇总,预测精度有所提高。此外,本研究引入了一个分析框架来表征AGB预测的不确定性,考虑误差在整个升级工作流程中的传播。在树水平上,AGB预测的总不确定性约为38.11%。与uavls测量的CW和H相关的误差对总不确定度的贡献最大,约为61.60%,其次是异速残差,占总不确定度的约37.71%,而模型参数仅占总不确定度的约0.69%。每样地误差约占估计AGB的11.67%,其中遗漏、委托和汇总树级误差分别约占总方差的78.97%、7.27%和13.76%,并且在AGB较高的样地总体上减小。仿真实验表明,随着空间分辨率的提高,树级综合误差的减小幅度最大。该研究不仅有助于提高利用UAVLS数据估计AGB的尺度,而且为量化相关不确定性提供了一个直观的框架。
{"title":"Mapping aboveground tree biomass and uncertainty using an upscaling approach: A case study of the larch forests in northeastern China using UAV laser scanning data","authors":"Yuanshuo Hao , Timo Pukkala , Xin Liu , Ying Quan , Lihu Dong , Fengri Li","doi":"10.1016/j.isprsjprs.2025.11.008","DOIUrl":"10.1016/j.isprsjprs.2025.11.008","url":null,"abstract":"<div><div>Forest biomass mapping, monitoring and verification are vital to understand carbon cycling and mitigate climate change. Nevertheless, a remote sensing–based framework that upscales biomass estimation from the individual-tree level remains underdeveloped, especially for rigorously quantifying the propagation of associated uncertainties. This study proposed an upscaling framework for aboveground biomass (AGB) mapping and prediction uncertainty estimation using unmanned aerial vehicle laser scanning (UAVLS) data. Tree-level metrics were first generated from delineated point clouds. The per-tree AGB was predicted with a developed allometric equation and then upscaled for coarser-scale prediction. A case study was conducted on larch (<em>Larix olgensis</em>) forests in Northeast China. An AGB allometric equation for remote sensing application purposes was established, adopting the crown width (CW) and tree height (H) as predictors, based on 147 destructive tree samples. Tree- and plot-level (30 m × 30 m) AGB values were estimated with root-mean-squared differences (RMSDs) of 33.84 % and 10.74 %, respectively, relative to field-based AGB estimates, and the prediction accuracy improved as the estimation was aggregated from the tree to the plot scale. Furthermore, this study introduced an analytical framework to characterize the AGB prediction uncertainty considering error propagation throughout the whole upscaling workflow. At the tree level, the total uncertainty in AGB prediction was approximately 38.11 %. The errors associated with the UAVLS-measured CW and H contributed the most to the total uncertainty, at approximately 61.60 %, followed by allometry residual errors, which contributed approximately 37.71 % to the total uncertainty, while model parameters only contributed approximately 0.69 % to the total uncertainty. The per-plot error accounted for approximately 11.67 % of the estimated AGB, of which omission, commission, and aggregated tree-level errors accounted for approximately 78.97 %, 7.27 %, and 13.76 %, respectively, of the total variance and generally decreased for the plots with higher AGB. A simulation experiment revealed that the aggregated tree-level errors decreased the most with the spatial resolution over the other errors. This study not only contributes to the upscaling of AGB estimation using UAVLS data but also provides an intuitive framework for quantifying the associated uncertainties.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 595-607"},"PeriodicalIF":12.2,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.isprsjprs.2025.11.006
Yanlin Wei , Xiaofeng Li , Lingjia Gu , Zhaojun Zheng , Yingjie Shi , YiJing Li , Xingming Zheng , Tao Jiang
Snow cover is a crucial factor in climate change and water resource management, particularly in forested regions, which affects vegetation carbon stocks by regulating local hydrological and temperature conditions. However, accurate snow depth (SD) estimation in forested regions remains significantly challenging because of the dual effects of forest canopy attenuation and radiation. To address this issue, a novel machine learning (ML) framework model by coupling microwave radiative transfer model (RTM) was proposed for SD retrieval in the forested regions of the Northern Hemisphere (NH), named the RTM-ML SD model. The physical mechanisms of forest–snow radiative transfer were incorporated into the ML model by introducing physically constrained brightness temperatures (TBs) that capture the radiative contributions of snow cover and forest canopy within the satellite field of view. Additionally, the spatiotemporal dynamic modeling strategy was implemented to ensure RTM-ML SD model flexibility and stability across different regions and seasons. Independent validations revealed that the retrieved SDs suitably agreed with in situ ground observations, with high correlation (R = 0.88) and low uncertainties (RMSE = 14.98 cm, MAE = 8.44 cm, and bias = − 1.46 cm), which are 8.0 % and 6.1 % lower, respectively, than those without introducing the RTM and spatiotemporal dynamic modeling strategy. Compared with that of well-known SD products and inversion algorithms (Chang, AMSR2, ERA5-Land, and Globsnow), the SD inversion accuracy increased by more than 50 %, effectively mitigating SD underestimation in forested regions. Overall, we conclude that 1) the impact of forest canopies on SD retrieval is complex, and the physical knowledge of forest–snow radiative transfer should be considered in SD estimation, and 2) the RTM-ML SD model exhibits notable temporal and spatial generalizability in the forested regions of the NH, thereby enhancing its capacity to support large-scale environmental monitoring, enable more reliable seasonal water resource forecasting, and inform climate adaptation strategies in snow-dependent regions.
积雪是气候变化和水资源管理的一个关键因素,特别是在森林地区,它通过调节当地水文和温度条件影响植被碳储量。然而,由于森林冠层衰减和辐射的双重影响,森林地区准确的雪深估算仍然具有很大的挑战性。为了解决这一问题,提出了一种基于耦合微波辐射传输模型(RTM)的机器学习(ML)框架模型,并将其命名为RTM-ML SD模型。通过引入捕获卫星视场内积雪和森林冠层辐射贡献的物理约束亮度温度(TBs),将森林-雪辐射转移的物理机制纳入ML模型。此外,采用时空动态建模策略,确保RTM-ML SD模型在不同地区和季节的灵活性和稳定性。独立验证结果表明,与未引入RTM和时空动态建模策略的结果相比,反演的SDs与地面观测结果吻合较好,具有较高的相关性(R = 0.88)和较低的不确定性(RMSE = 14.98 cm, MAE = 8.44 cm,偏差= - 1.46 cm),分别降低了8.0%和6.1%。与常用的SD产品和反演算法(Chang、AMSR2、ERA5-Land、Globsnow)相比,反演精度提高50%以上,有效缓解了森林地区的SD低估。结果表明:1)森林冠层对SD反演的影响较为复杂,在进行SD估算时应考虑森林-雪辐射转移的物理知识;2)RTM-ML SD模型在NH林区具有显著的时空泛化性,从而增强了其支持大尺度环境监测的能力,使季节性水资源预测更加可靠;并为依赖雪的地区的气候适应战略提供信息。
{"title":"Enhancing snow depth estimation in forested regions of the northern hemisphere: A physically-constrained machine learning approach with spatiotemporal dynamics","authors":"Yanlin Wei , Xiaofeng Li , Lingjia Gu , Zhaojun Zheng , Yingjie Shi , YiJing Li , Xingming Zheng , Tao Jiang","doi":"10.1016/j.isprsjprs.2025.11.006","DOIUrl":"10.1016/j.isprsjprs.2025.11.006","url":null,"abstract":"<div><div>Snow cover is a crucial factor in climate change and water resource management, particularly in forested regions, which affects vegetation carbon stocks by regulating local hydrological and temperature conditions. However, accurate snow depth (SD) estimation in forested regions remains significantly challenging because of the dual effects of forest canopy attenuation and radiation. To address this issue, a novel machine learning (ML) framework model by coupling microwave radiative transfer model (RTM) was proposed for SD retrieval in the forested regions of the Northern Hemisphere (NH), named the RTM-ML SD model. The physical mechanisms of forest–snow radiative transfer were incorporated into the ML model by introducing physically constrained brightness temperatures (TBs) that capture the radiative contributions of snow cover and forest canopy within the satellite field of view. Additionally, the spatiotemporal dynamic modeling strategy was implemented to ensure RTM-ML SD model flexibility and stability across different regions and seasons. Independent validations revealed that the retrieved SDs suitably agreed with in situ ground observations, with high correlation (<em>R =</em> 0.88) and low uncertainties (<em>RMSE =</em> 14.98 cm<em>, MAE =</em> 8.44 cm<em>,</em> and bias <em>= −</em> 1.46 cm), which are 8.0 % and 6.1 % lower, respectively, than those without introducing the RTM and spatiotemporal dynamic modeling strategy. Compared with that of well-known SD products and inversion algorithms (Chang, AMSR2, ERA5-Land, and Globsnow), the SD inversion accuracy increased by more than 50 %, effectively mitigating SD underestimation in forested regions. Overall, we conclude that 1) the impact of forest canopies on SD retrieval is complex, and the physical knowledge of forest–snow radiative transfer should be considered in SD estimation, and 2) the RTM-ML SD model exhibits notable temporal and spatial generalizability in the forested regions of the NH, thereby enhancing its capacity to support large-scale environmental monitoring, enable more reliable seasonal water resource forecasting, and inform climate adaptation strategies in snow-dependent regions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 576-594"},"PeriodicalIF":12.2,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.isprsjprs.2025.11.005
Zihuan Guo , Hong Zhang , Ji Ge , Xiao-Ming Li , Peifeng Ma , Haoxuan Duan , Lu Xu , Chao Wang
Compact-polarimetric (CP) synthetic aperture radar (SAR) offers wide swath coverage and relatively rich polarimetric information, making it a promising mode of earth observation. However, compared to quad-polarimetric (QP) data, CP data are still limited in fully characterizing polarimetric scattering mechanisms. Existing methods for reconstructing QP data from CP data face challenges such as the lack of physical properties in the QP covariance matrix and insufficient utilization of polarimetric, terrain, and imaging information. To address these limitations, this paper proposes a multimodal correlation-preserving latent diffusion framework under mathematical-physical constraints for CP to QP SAR data reconstruction (SAR-C2QM). First, we propose a mathematical-physical hard constraint through the Cholesky decomposition method and a two-dimensional phase vector encoding approach, ensuring both the positive semi-definiteness and phase continuity of the reconstructed QP data. Next, SAR image modality and corresponding terrain-imaging information modality are constructed based on multi-source data, providing rich polarimetric and effective terrain-imaging multimodal information for model training. Furthermore, a multimodal correlation-preserving latent diffusion model with a hierarchical bidirectional cross-attention module is designed, enabling the fusion of multimodal and multi-scale features, with mathematical-physical constraints guiding the reconstruction process to achieve high-fidelity QP reconstruction across large-area coverage and complex terrain conditions. Experiments on simulated CP data derived from Gaofen-3 and Radarsat-2 QP data across diverse wave codes demonstrate that the proposed method outperforms existing methods in both accuracy and stability. Specifically, the Wishart distance error between reconstructed and original QP data is reduced by over 12.1%. Moreover, the overall accuracy of land cover classification using the reconstructed QP data improves by an average of 15.1% compared to CP data, closely approaching the classification accuracy of the original QP data. The code is available in https://github.com/zihuan21/SAR-C2QM.
{"title":"Compact-pol to quad-pol SAR reconstruction via a joint mathematical-physical-constrained multimodal correlation-preserving latent diffusion framework","authors":"Zihuan Guo , Hong Zhang , Ji Ge , Xiao-Ming Li , Peifeng Ma , Haoxuan Duan , Lu Xu , Chao Wang","doi":"10.1016/j.isprsjprs.2025.11.005","DOIUrl":"10.1016/j.isprsjprs.2025.11.005","url":null,"abstract":"<div><div>Compact-polarimetric (CP) synthetic aperture radar (SAR) offers wide swath coverage and relatively rich polarimetric information, making it a promising mode of earth observation. However, compared to quad-polarimetric (QP) data, CP data are still limited in fully characterizing polarimetric scattering mechanisms. Existing methods for reconstructing QP data from CP data face challenges such as the lack of physical properties in the QP covariance matrix and insufficient utilization of polarimetric, terrain, and imaging information. To address these limitations, this paper proposes a multimodal correlation-preserving latent diffusion framework under mathematical-physical constraints for CP to QP SAR data reconstruction (SAR-C2QM). First, we propose a mathematical-physical hard constraint through the Cholesky decomposition method and a two-dimensional phase vector encoding approach, ensuring both the positive semi-definiteness and phase continuity of the reconstructed QP data. Next, SAR image modality and corresponding terrain-imaging information modality are constructed based on multi-source data, providing rich polarimetric and effective terrain-imaging multimodal information for model training. Furthermore, a multimodal correlation-preserving latent diffusion model with a hierarchical bidirectional cross-attention module is designed, enabling the fusion of multimodal and multi-scale features, with mathematical-physical constraints guiding the reconstruction process to achieve high-fidelity QP reconstruction across large-area coverage and complex terrain conditions. Experiments on simulated CP data derived from Gaofen-3 and Radarsat-2 QP data across diverse wave codes demonstrate that the proposed method outperforms existing methods in both accuracy and stability. Specifically, the Wishart distance error between reconstructed and original QP data is reduced by over 12.1%. Moreover, the overall accuracy of land cover classification using the reconstructed QP data improves by an average of 15.1% compared to CP data, closely approaching the classification accuracy of the original QP data. The code is available in <span><span>https://github.com/zihuan21/SAR-C2QM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 552-575"},"PeriodicalIF":12.2,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1016/j.isprsjprs.2025.11.011
Menghao Du, Zhenfeng Shao, Xiongwu Xiao, Jindou Zhang, Duowang Zhu, Jinyang Wang, Timo Balz, Deren Li
<div><div>Floods are highly destructive natural disasters that threaten both society and the environment. Given the all-weather, all-time imaging capability of synthetic aperture radar (SAR), analyzing flood events using SAR imagery across diverse scenarios is essential for developing high-precision and robust detection models. However, existing transformer-based change detection methods achieve high precision, but their high computational cost and large parameter sizes necessitate lightweight design while maintaining detection accuracy. Moreover, existing studies focus on a few specific scenarios without thorough validation and in-depth analysis of model strengths across diverse flood conditions with imbalanced inundation ratios. To address these challenges, this paper proposes an adaptive window and context-aware attention network (AWCA-Net) for high-precision SAR-based flood change detection under diverse flooding scenarios, achieving a lightweight model while maintaining the highest detection accuracy. AWCA-Net has three key advantages: Firstly, the neighborhood feature enhancement module with contextual information (NECM) strengthens the discrimination of subtle and heterogeneous flood changes. Secondly, the large kernel grouping attention gate module based on high-low layer feature difference (LGDM) leverages difference-weighted attention to effectively guide the selection of flood-relevant features. Thirdly, the multi-scale convolutional attention module based on adaptive window selection (MSAWM) dynamically adjusts kernel sizes to capture diverse flood change patterns. To better train and evaluate AWCA-Net, we constructed VarFloods, the first large-scale and enriched-diverse benchmark dataset for flood change detection that spans five continents and includes diverse regions, scenarios, land cover types, causes, years, and inundation ratios, which includes both GRD and preprocessed versions. We evaluated AWCA-Net’s performance on three datasets. We found that: (1) AWCA-Net achieves the highest-precision while maintaining significantly lower computational cost, outperforming other state-of-the-art (SOTA) methods. On the two representative public datasets and the enriched-diverse benchmark dataset (VarFloods-G and VarFloods-P), AWCA-Net improves the IoU by 11.59 % to 42.57 % over the basic model, 2.16 % to 11.48 % over an advanced transformer-based model, and 1.31 % to 1.68 % over the best comparative model, while maintaining a computational cost of only 17.53G, which is just 8.7 % to 60 % of existing SOTA models. (2) Difference-guided attention enhances detection in complex background regions, neighborhood-based fusion improves performance in irregular terrains, and adaptive convolution contributes to stable results across diverse flood scenarios. And the proposed AWCA-Net demonstrates strong generalization and stability under diverse flood scenarios with imbalanced inundation ratios. The dataset and code of AWCA-Net will be released at: <s
{"title":"High-precision flood change detection with lightweight SAR transformer network and context-aware attention for enriched-diverse and complex flooding scenarios","authors":"Menghao Du, Zhenfeng Shao, Xiongwu Xiao, Jindou Zhang, Duowang Zhu, Jinyang Wang, Timo Balz, Deren Li","doi":"10.1016/j.isprsjprs.2025.11.011","DOIUrl":"10.1016/j.isprsjprs.2025.11.011","url":null,"abstract":"<div><div>Floods are highly destructive natural disasters that threaten both society and the environment. Given the all-weather, all-time imaging capability of synthetic aperture radar (SAR), analyzing flood events using SAR imagery across diverse scenarios is essential for developing high-precision and robust detection models. However, existing transformer-based change detection methods achieve high precision, but their high computational cost and large parameter sizes necessitate lightweight design while maintaining detection accuracy. Moreover, existing studies focus on a few specific scenarios without thorough validation and in-depth analysis of model strengths across diverse flood conditions with imbalanced inundation ratios. To address these challenges, this paper proposes an adaptive window and context-aware attention network (AWCA-Net) for high-precision SAR-based flood change detection under diverse flooding scenarios, achieving a lightweight model while maintaining the highest detection accuracy. AWCA-Net has three key advantages: Firstly, the neighborhood feature enhancement module with contextual information (NECM) strengthens the discrimination of subtle and heterogeneous flood changes. Secondly, the large kernel grouping attention gate module based on high-low layer feature difference (LGDM) leverages difference-weighted attention to effectively guide the selection of flood-relevant features. Thirdly, the multi-scale convolutional attention module based on adaptive window selection (MSAWM) dynamically adjusts kernel sizes to capture diverse flood change patterns. To better train and evaluate AWCA-Net, we constructed VarFloods, the first large-scale and enriched-diverse benchmark dataset for flood change detection that spans five continents and includes diverse regions, scenarios, land cover types, causes, years, and inundation ratios, which includes both GRD and preprocessed versions. We evaluated AWCA-Net’s performance on three datasets. We found that: (1) AWCA-Net achieves the highest-precision while maintaining significantly lower computational cost, outperforming other state-of-the-art (SOTA) methods. On the two representative public datasets and the enriched-diverse benchmark dataset (VarFloods-G and VarFloods-P), AWCA-Net improves the IoU by 11.59 % to 42.57 % over the basic model, 2.16 % to 11.48 % over an advanced transformer-based model, and 1.31 % to 1.68 % over the best comparative model, while maintaining a computational cost of only 17.53G, which is just 8.7 % to 60 % of existing SOTA models. (2) Difference-guided attention enhances detection in complex background regions, neighborhood-based fusion improves performance in irregular terrains, and adaptive convolution contributes to stable results across diverse flood scenarios. And the proposed AWCA-Net demonstrates strong generalization and stability under diverse flood scenarios with imbalanced inundation ratios. The dataset and code of AWCA-Net will be released at: <s","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 507-531"},"PeriodicalIF":12.2,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1016/j.isprsjprs.2025.10.039
Du Wang , Li-Qin Cao , Yu-Hao Du , Hai-Yang Xiong , Fa-Wang Ye , Yan-Fei Zhong
As thermal infrared remote sensing advances toward higher spectral and spatial resolutions, the need for precise surface parameters is becoming increasingly important for earth observation applications. However, effective retrieval remains fundamentally challenging due to degraded spectral quality caused by narrow bandwidths of thermal infrared hyperspectral imagers, atmospheric line absorption interference, and limitations in sensor manufacturing. To address this, this study introduces a Noise-Resilient Atmospheric Compensation with Temperature and Emissivity Separation (NRAC-TES) method, where the noise-resistant capability is mainly achieved through the NRAC module during the atmospheric compensation (AC) stage. The robust retrieval strategy uses data-model correlations constrained by atmospheric parameter features, enabling adaptive adjustment of model parameters to match real-world noisy data and reducing biases caused by spectral noise in at-sensor radiance. The temperature and emissivity separation of ASTER-TES incorporates an atmospheric downwelling radiance lookup table, allowing direct retrieval of land surface temperature and emissivity without needing prior surface-atmospheric information. Ground validation experiments on Hypercam-LW demonstrate that the proposed NRAC-TES achieves an AC accuracy of 1.2 K for ground-leaving radiance, with corresponding errors of 1.2 K and 0.025 for LST and LSE, respectively. Additionally, comparison with traditional methods like MODTRAN-TES, ICCAAC-TES, and FSW-TES, based on airborne datasets from HyTES and TASI, highlights the importance and effectiveness of the noise-resilient design incorporated in our approach.
{"title":"Toward noise-resilient retrieval of land surface temperature and emissivity using airborne thermal infrared hyperspectral imagery","authors":"Du Wang , Li-Qin Cao , Yu-Hao Du , Hai-Yang Xiong , Fa-Wang Ye , Yan-Fei Zhong","doi":"10.1016/j.isprsjprs.2025.10.039","DOIUrl":"10.1016/j.isprsjprs.2025.10.039","url":null,"abstract":"<div><div>As thermal infrared remote sensing advances toward higher spectral and spatial resolutions, the need for precise surface parameters is becoming increasingly important for earth observation applications. However, effective retrieval remains fundamentally challenging due to degraded spectral quality caused by narrow bandwidths of thermal infrared hyperspectral imagers, atmospheric line absorption interference, and limitations in sensor manufacturing. To address this, this study introduces a Noise-Resilient Atmospheric Compensation with Temperature and Emissivity Separation (NRAC-TES) method, where the noise-resistant capability is mainly achieved through the NRAC module during the atmospheric compensation (AC) stage. The robust retrieval strategy uses data-model correlations constrained by atmospheric parameter features, enabling adaptive adjustment of model parameters to match real-world noisy data and reducing biases caused by spectral noise in at-sensor radiance. The temperature and emissivity separation of ASTER-TES incorporates an atmospheric downwelling radiance lookup table, allowing direct retrieval of land surface temperature and emissivity without needing prior surface-atmospheric information. Ground validation experiments on Hypercam-LW demonstrate that the proposed NRAC-TES achieves an AC accuracy of 1.2 K for ground-leaving radiance, with corresponding errors of 1.2 K and 0.025 for LST and LSE, respectively. Additionally, comparison with traditional methods like MODTRAN-TES, ICCAAC-TES, and FSW-TES, based on airborne datasets from HyTES and TASI, highlights the importance and effectiveness of the noise-resilient design incorporated in our approach.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 532-551"},"PeriodicalIF":12.2,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-11DOI: 10.1016/j.isprsjprs.2025.11.010
Ningjing Wang , Xinyu Wang , Yang Pan , Lei Lei , Wanqiang Yao , Yanfei Zhong
Rural roads are vital infrastructure for promoting agricultural modernization and rural revitalization, and their accurate and efficient extraction is of great significance. However, rural road extraction remains highly challenging, as these roads are typically extremely narrow and often span only a few pixels in high-resolution imagery, which easily leads to attenuation or loss of critical pixels during deep feature encoding. In addition, low contrast, weak textures, and spectral similarity to the surrounding environment further reduce their distinguishability. The combined effect makes extracting narrow and spectrally similar rural rsoads particularly difficult, compromising both the topological connectivity and structural integrity of road networks. To address these challenges, we propose the Narrow-Road-Aware Network (NANet), a coarse-to-fine two-stage framework. In the positioning stage, NANet effectively strengthens weak road responses at multiple scales while preserving global consistency. In the refinement stage, a foreground-background collaborative mechanism enhances road saliency and suppresses background interference, thereby significantly improving the continuity and completeness of rural road extraction. In addition, we construct the WHU-CR dataset, a large-scale benchmark specifically designed for rural road extraction in China, which covers 14 provinces across China’s seven major grain-producing regions and contains 130,412 pairs of 512 × 512 images with annotated masks, providing a solid foundation for effectively developing and evaluating deep learning models for rural road extraction. Experimental results on the WHU-CR and DeepGlobe datasets demonstrate that NANet outperforms state-of-the-art methods. In large-scale rural road mapping, NANet exhibits strong practical applicability and generalization. The dataset and code will be available at: https://rsidea.whu.edu.cn/resource_sharing.htm.
{"title":"Identifying rural roads in remote sensing imagery: From benchmark dataset to coarse-to-fine extraction network—A case study in China","authors":"Ningjing Wang , Xinyu Wang , Yang Pan , Lei Lei , Wanqiang Yao , Yanfei Zhong","doi":"10.1016/j.isprsjprs.2025.11.010","DOIUrl":"10.1016/j.isprsjprs.2025.11.010","url":null,"abstract":"<div><div>Rural roads are vital infrastructure for promoting agricultural modernization and rural revitalization, and their accurate and efficient extraction is of great significance. However, rural road extraction remains highly challenging, as these roads are typically extremely narrow and often span only a few pixels in high-resolution imagery, which easily leads to attenuation or loss of critical pixels during deep feature encoding. In addition, low contrast, weak textures, and spectral similarity to the surrounding environment further reduce their distinguishability. The combined effect makes extracting narrow and spectrally similar rural rsoads particularly difficult, compromising both the topological connectivity and structural integrity of road networks. To address these challenges, we propose the Narrow-Road-Aware Network (NANet), a coarse-to-fine two-stage framework. In the positioning stage, NANet effectively strengthens weak road responses at multiple scales while preserving global consistency. In the refinement stage, a foreground-background collaborative mechanism enhances road saliency and suppresses background interference, thereby significantly improving the continuity and completeness of rural road extraction. In addition, we construct the WHU-CR dataset, a large-scale benchmark specifically designed for rural road extraction in China, which covers 14 provinces across China’s seven major grain-producing regions and contains 130,412 pairs of 512 × 512 images with annotated masks, providing a solid foundation for effectively developing and evaluating deep learning models for rural road extraction. Experimental results on the WHU-CR and DeepGlobe datasets demonstrate that NANet outperforms state-of-the-art methods. In large-scale rural road mapping, NANet exhibits strong practical applicability and generalization. The dataset and code will be available at: <span><span>https://rsidea.whu.edu.cn/resource_sharing.htm</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 487-506"},"PeriodicalIF":12.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-11DOI: 10.1016/j.isprsjprs.2025.11.009
Tingzhu Wang, Linlin Wang, Junwei Luo, Kang Wu, Yansheng Li
Scene graph generation (SGG) in remote sensing imagery (RSI) plays a critical role in advancing the understanding of geospatial scenarios. The incorporation of scene graphs, which capture the relationships between objects within imagery, has significantly enhanced the performance of various RS tasks, such as RS image retrieval and RS image captioning. However, the long-tailed distribution of relationships has adversely affected the SGG quality, leading to biased relationship predictions. In our previous Sample-Level Bias Prediction (SBP) method, we find that the union region of one object pair (i.e., one sample) contains rich and dedicated contextual information, enabling the prediction of the sample-specific bias for correcting the biased relationship prediction. Nevertheless, SBP employs a single model to correct all categories, which results in the suppression of certain relationship categories while enhancing the detection accuracy of others. To address this challenge, we present a Bias-Aware Learning (BAL) method that employs group collaboration to alleviate the burden of a single model correcting all categories. During BAL, we initially group the relationship categories and deploy group-specific SBP models to learn and predict the corresponding biases within their relationships. Subsequently, we design a Group-Collaborative Distillation (GCD) strategy to facilitate the collaboration among SBP models from different groups, thereby alleviating category suppression and enhancing the overall performance of SGG. The extensive experimental results on STAR and AUG datasets for RSI demonstrate that our BAL outperforms the state-of-the-art methods. Compared to the best model RPCM on STAR, RPCM equipped with BAL shows a significant average improvement of 5.08%/5.00%, 3.87%/3.64% and 0.22%/0.25% on HMR@K for PredCls, SGCls, and SGDet tasks, respectively. Moreover, we have also validated the effectiveness and generalization of our BAL on VG dataset for natural imagery. The code will be available at https://github.com/Zhuzi24/BAL.
{"title":"Bias-aware learning for unbiased scene graph generation in remote sensing imagery","authors":"Tingzhu Wang, Linlin Wang, Junwei Luo, Kang Wu, Yansheng Li","doi":"10.1016/j.isprsjprs.2025.11.009","DOIUrl":"10.1016/j.isprsjprs.2025.11.009","url":null,"abstract":"<div><div>Scene graph generation (SGG) in remote sensing imagery (RSI) plays a critical role in advancing the understanding of geospatial scenarios. The incorporation of scene graphs, which capture the relationships between objects within imagery, has significantly enhanced the performance of various RS tasks, such as RS image retrieval and RS image captioning. However, the long-tailed distribution of relationships has adversely affected the SGG quality, leading to biased relationship predictions. In our previous Sample-Level Bias Prediction (SBP) method, we find that the union region of one object pair (i.e., one sample) contains rich and dedicated contextual information, enabling the prediction of the sample-specific bias for correcting the biased relationship prediction. Nevertheless, SBP employs a single model to correct all categories, which results in the suppression of certain relationship categories while enhancing the detection accuracy of others. To address this challenge, we present a Bias-Aware Learning (BAL) method that employs group collaboration to alleviate the burden of a single model correcting all categories. During BAL, we initially group the relationship categories and deploy group-specific SBP models to learn and predict the corresponding biases within their relationships. Subsequently, we design a Group-Collaborative Distillation (GCD) strategy to facilitate the collaboration among SBP models from different groups, thereby alleviating category suppression and enhancing the overall performance of SGG. The extensive experimental results on STAR and AUG datasets for RSI demonstrate that our BAL outperforms the state-of-the-art methods. Compared to the best model RPCM on STAR, RPCM equipped with BAL shows a significant average improvement of 5.08%/5.00%, 3.87%/3.64% and 0.22%/0.25% on HMR@K for PredCls, SGCls, and SGDet tasks, respectively. Moreover, we have also validated the effectiveness and generalization of our BAL on VG dataset for natural imagery. The code will be available at <span><span>https://github.com/Zhuzi24/BAL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 473-486"},"PeriodicalIF":12.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145499145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}