Pub Date : 2025-12-01DOI: 10.1016/j.isprsjprs.2025.11.027
Jianxin He , Yuxin Zhao , Shuo Yang , Woping Wu , Jian Wang , Xiong Deng
Accurate, high-resolution forecasting of Arctic sea-ice concentration (SIC) during the melting season is crucial for climate monitoring and polar navigation, yet remains hindered by the system’s complex, multi-scale, and cross-sphere dynamics. We present MSS-STFormer, an explainable multi-scale spatiotemporal Transformer designed for subseasonal SIC super-resolution forecasting. The model integrates 14 environmental factors spanning the ice, ocean, and atmosphere, and incorporates four specialized modules to enhance spatiotemporal representation and physical consistency. Trained with OSTIA satellite observations and ERA5 reanalysis data, MSS-STFormer achieves high forecasting skill over a 60-day horizon, yielding an RMSE of 0.049, a correlation of 0.9951, an SSIM of 0.9603, and a BACC of 0.9656. Post-hoc explainability methods, Gradient SHAP and LIME—reveal that the model captures a temporally evolving prediction mechanism: early forecasts are dominated by persistence of initial conditions, mid-term phases are governed by atmospheric dynamics such as wind and pressure, and later stages transition to a coupled influence of radiative and dynamic processes. This progression aligns closely with established thermodynamic and dynamic theories of sea-ice evolution, underscoring the model’s ability to identify physically meaningful drivers. The framework demonstrates strong potential for advancing explainable GeoAI in Earth observation, combining predictive accuracy with physical explainability for operational Arctic SIC monitoring and climate applications.
{"title":"Explainable spatiotemporal deep learning for subseasonal super-resolution forecasting of Arctic sea ice concentration during the melting season","authors":"Jianxin He , Yuxin Zhao , Shuo Yang , Woping Wu , Jian Wang , Xiong Deng","doi":"10.1016/j.isprsjprs.2025.11.027","DOIUrl":"10.1016/j.isprsjprs.2025.11.027","url":null,"abstract":"<div><div>Accurate, high-resolution forecasting of Arctic sea-ice concentration (SIC) during the melting season is crucial for climate monitoring and polar navigation, yet remains hindered by the system’s complex, multi-scale, and cross-sphere dynamics. We present MSS-STFormer, an explainable multi-scale spatiotemporal Transformer designed for subseasonal SIC super-resolution forecasting. The model integrates 14 environmental factors spanning the ice, ocean, and atmosphere, and incorporates four specialized modules to enhance spatiotemporal representation and physical consistency. Trained with OSTIA satellite observations and ERA5 reanalysis data, MSS-STFormer achieves high forecasting skill over a 60-day horizon, yielding an RMSE of 0.049, a correlation of 0.9951, an SSIM of 0.9603, and a BACC of 0.9656. Post-hoc explainability methods, Gradient SHAP and LIME—reveal that the model captures a temporally evolving prediction mechanism: early forecasts are dominated by persistence of initial conditions, mid-term phases are governed by atmospheric dynamics such as wind and pressure, and later stages transition to a coupled influence of radiative and dynamic processes. This progression aligns closely with established thermodynamic and dynamic theories of sea-ice evolution, underscoring the model’s ability to identify physically meaningful drivers. The framework demonstrates strong potential for advancing explainable GeoAI in Earth observation, combining predictive accuracy with physical explainability for operational Arctic SIC monitoring and climate applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 1-17"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.isprsjprs.2025.11.031
Jianing Zhen , Dehua Mao , Yeqiao Wang , Junjie Wang , Chenwei Nie , Shiqi Huo , Hengxing Xiang , Yongxing Ren , Ling Luo , Zongming Wang
Leaf area index (LAI) of wetland vegetation provides vital information for its growth condition, structure and functioning. Accurately mapping LAI at a broad scale is essential for conservation and rehabilitation of wetland ecosystem. However, owing to the spatial complexity and periodic inundation characteristics of wetland vegetation, retrieving LAI of wetlands remains a challenging task with significant uncertainty. Here, with 865 in-situ measurements across different wetland biomes in China during 2013–2023, we proposed a hybrid strategy that incorporated active learning (AL) technique, physically-based PROSAIL-5B model, and Random Forest machine learning algorithm to map wetland biomes LAI across China from Sentinel-2 and Landsat-8 imagery. The validation results showed that the hybrid approach outperformed physically-based and empirically-based methods and achieved higher accuracy (R2 increased by 0.15–0.40, RMSE decreased by 0.02–0.27, and RRMSE reduced by 3.37–12.78 %). Additionally, three indices that we newly-developed (TBVI5, TBVI3 and TBVI1) exhibited superior potential for LAI inversion across different types of wetland vegetation. Our mapping results exhibited spatial details and consistency, and matched with in-situ observations from Sentinel-2 compared to Landsat-8 and the other MODIS-based products. In this study, we developed the first national-scale mapping of wetland vegetation LAI in China. The findings offer insights into accurate retrieval of LAI in wetland vegetation, providing valuable support for the scientific restoration of wetlands and assessing their responses to climate change.
{"title":"National mapping of wetland vegetation leaf area index in China using hybrid model with Sentinel-2 and Landsat-8 data","authors":"Jianing Zhen , Dehua Mao , Yeqiao Wang , Junjie Wang , Chenwei Nie , Shiqi Huo , Hengxing Xiang , Yongxing Ren , Ling Luo , Zongming Wang","doi":"10.1016/j.isprsjprs.2025.11.031","DOIUrl":"10.1016/j.isprsjprs.2025.11.031","url":null,"abstract":"<div><div>Leaf area index (LAI) of wetland vegetation provides vital information for its growth condition, structure and functioning. Accurately mapping LAI at a broad scale is essential for conservation and rehabilitation of wetland ecosystem. However, owing to the spatial complexity and periodic inundation characteristics of wetland vegetation, retrieving LAI of wetlands remains a challenging task with significant uncertainty. Here, with 865 in-situ measurements across different wetland biomes in China during 2013–2023, we proposed a hybrid strategy that incorporated active learning (AL) technique, physically-based PROSAIL-5B model, and Random Forest machine learning algorithm to map wetland biomes LAI across China from Sentinel-2 and Landsat-8 imagery. The validation results showed that the hybrid approach outperformed physically-based and empirically-based methods and achieved higher accuracy (R<sup>2</sup> increased by 0.15–0.40, RMSE decreased by 0.02–0.27, and RRMSE reduced by 3.37–12.78 %). Additionally, three indices that we newly-developed (TBVI5, TBVI3 and TBVI1) exhibited superior potential for LAI inversion across different types of wetland vegetation. Our mapping results exhibited spatial details and consistency, and matched with in-situ observations from Sentinel-2 compared to Landsat-8 and the other MODIS-based products. In this study, we developed the first national-scale mapping of wetland vegetation LAI in China. The findings offer insights into accurate retrieval of LAI in wetland vegetation, providing valuable support for the scientific restoration of wetlands and assessing their responses to climate change.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"232 ","pages":"Pages 18-33"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145658151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.isprsjprs.2025.11.025
Liang Chen , Yifei Yin , Hao Shi , Jingfei He , Wei Li
Speckle noise is generated along with the SAR imaging mechanism and degrades the quality of SAR images, leading to difficult interpretation. Hence, despeckling is an indispensable step in SAR pre-processing. Fortunately, supervised learning (SL) has proven to be a progressive method for SAR image despeckling. SL methods necessitate the availability of both original SAR images and their speckle-free counterparts during training, whilst speckle-free SAR images do not exist in the real world. Even though there are several substitutes for speckle-free images, the domain gap leads to poor performance and adaptability. Self-supervision provides an approach to training without clean reference. However, most self-supervised methods introduce additional requirements on speckle modeling or specific data, posing challenges in real-world applications. To address these challenges, we propose a general Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training. Firstly, the theoretical feasibility of SAR image despeckling without speckle-free images is established. A self-supervised despeckling criteria suitable for diverse SAR images is proposed. Subsequently, a Random-Aware sub-SAMpler with Projection correLation Estimation (RA-SAMPLE) is put forth. Mutually independent training pairs can be derived from actual SAR intensity images. Furthermore, a multi-feature loss function is introduced, consisting of a despeckling term, a regularization term, and a perception term. The performance of speckle suppression and texture preservation is well-balanced. Experiments reveal that the proposed method performs comparably to supervised approaches on synthetic data and outperforms them on actual data. Both visual and quantitative evaluations confirm its superiority over state-of-the-art despeckling techniques. Moreover, the results demonstrates that SDS-SAR provides a novel solution for noise suppression in other multiplicative coherent systems. The trained model and dataset will be available at https://github.com/YYF121/SDS-SAR.
{"title":"Self-supervised despeckling based solely on SAR intensity images: A general strategy","authors":"Liang Chen , Yifei Yin , Hao Shi , Jingfei He , Wei Li","doi":"10.1016/j.isprsjprs.2025.11.025","DOIUrl":"10.1016/j.isprsjprs.2025.11.025","url":null,"abstract":"<div><div>Speckle noise is generated along with the SAR imaging mechanism and degrades the quality of SAR images, leading to difficult interpretation. Hence, despeckling is an indispensable step in SAR pre-processing. Fortunately, supervised learning (SL) has proven to be a progressive method for SAR image despeckling. SL methods necessitate the availability of both original SAR images and their speckle-free counterparts during training, whilst speckle-free SAR images do not exist in the real world. Even though there are several substitutes for speckle-free images, the domain gap leads to poor performance and adaptability. Self-supervision provides an approach to training without clean reference. However, most self-supervised methods introduce additional requirements on speckle modeling or specific data, posing challenges in real-world applications. To address these challenges, we propose a general Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training. Firstly, the theoretical feasibility of SAR image despeckling without speckle-free images is established. A self-supervised despeckling criteria suitable for diverse SAR images is proposed. Subsequently, a Random-Aware sub-SAMpler with Projection correLation Estimation (RA-SAMPLE) is put forth. Mutually independent training pairs can be derived from actual SAR intensity images. Furthermore, a multi-feature loss function is introduced, consisting of a despeckling term, a regularization term, and a perception term. The performance of speckle suppression and texture preservation is well-balanced. Experiments reveal that the proposed method performs comparably to supervised approaches on synthetic data and outperforms them on actual data. Both visual and quantitative evaluations confirm its superiority over state-of-the-art despeckling techniques. Moreover, the results demonstrates that SDS-SAR provides a novel solution for noise suppression in other multiplicative coherent systems. The trained model and dataset will be available at <span><span>https://github.com/YYF121/SDS-SAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 854-873"},"PeriodicalIF":12.2,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145657555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-30DOI: 10.1016/j.isprsjprs.2025.11.029
Wubiao Huang , Huchen Li , Shuai Zhang , Fei Deng
Open-vocabulary semantic segmentation (OVSS) task presents a new challenge for remote sensing image understanding by requiring the recognition of previously unseen or novel classes during inference. However, existing OVSS methods often suffer from severe semantic ambiguity in land cover classification due to inconsistent naming conventions, hierarchical dependency, and insufficient semantic proximity in the embedding space. To address these issues, we propose KG-OVRSeg, a novel framework that mitigates semantic ambiguity by aggregating structured knowledge from a knowledge graph. This approach significantly enhances intra-class compactness and inter-class separability in the embedding space, thereby fundamentally enhancing class representations. We design a knowledge graph-enhanced class encoder (KGCE) that generates enriched class embeddings by querying hypernym–hyponym and synonym relationships within a localized knowledge graph. These enhanced embeddings are further utilized by a class attention gradual decoder (CAGD), which leverages a class-aware attention mechanism and guidance refinement to guide feature decoding. Extensive experiments on seven publicly available datasets demonstrated that KG-OVRSeg achieves state-of-the-art performance, with a mean mF1 of 51.65% and a mean mIoU of 39.18%, surpassing previous methods by 8.06% mF1 and 6.52% mIoU. Comprehensive ablation and visual analyses confirmed that KGCE significantly improves intra-class semantic compactness and inter-class separability in the embedding space, playing a crucial role in mitigating semantic inconsistency. Our work offers a robust and scalable solution for ambiguity-aware open-vocabulary tasks in remote sensing. The code is publicly available at https://github.com/HuangWBill/KG-OVRSeg.
{"title":"Reducing semantic ambiguity in open-vocabulary remote sensing image segmentation via knowledge graph-enhanced class representations","authors":"Wubiao Huang , Huchen Li , Shuai Zhang , Fei Deng","doi":"10.1016/j.isprsjprs.2025.11.029","DOIUrl":"10.1016/j.isprsjprs.2025.11.029","url":null,"abstract":"<div><div>Open-vocabulary semantic segmentation (OVSS) task presents a new challenge for remote sensing image understanding by requiring the recognition of previously unseen or novel classes during inference. However, existing OVSS methods often suffer from severe semantic ambiguity in land cover classification due to inconsistent naming conventions, hierarchical dependency, and insufficient semantic proximity in the embedding space. To address these issues, we propose KG-OVRSeg, a novel framework that mitigates semantic ambiguity by aggregating structured knowledge from a knowledge graph. This approach significantly enhances intra-class compactness and inter-class separability in the embedding space, thereby fundamentally enhancing class representations. We design a knowledge graph-enhanced class encoder (KGCE) that generates enriched class embeddings by querying hypernym–hyponym and synonym relationships within a localized knowledge graph. These enhanced embeddings are further utilized by a class attention gradual decoder (CAGD), which leverages a class-aware attention mechanism and guidance refinement to guide feature decoding. Extensive experiments on seven publicly available datasets demonstrated that KG-OVRSeg achieves state-of-the-art performance, with a mean mF1 of 51.65% and a mean mIoU of 39.18%, surpassing previous methods by 8.06% mF1 and 6.52% mIoU. Comprehensive ablation and visual analyses confirmed that KGCE significantly improves intra-class semantic compactness and inter-class separability in the embedding space, playing a crucial role in mitigating semantic inconsistency. Our work offers a robust and scalable solution for ambiguity-aware open-vocabulary tasks in remote sensing. The code is publicly available at <span><span>https://github.com/HuangWBill/KG-OVRSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 837-853"},"PeriodicalIF":12.2,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145619489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.isprsjprs.2025.11.026
Yuwei Cao , Nicholas C. Coops , Brent A. Murray , Ian Sinclair , Robere-McGugan Geordie
Accurate estimation and mapping of tree species composition (TSC) is crucial for sustainable forest management. Recent advances in Light Detection and Ranging (lidar) technology and the availability of moderate spatial resolution, surface reflectance time series passive optical imagery offer scalable and efficient approaches for automated TSC estimation. In this research we develop a novel deep learning framework, M3F-Net (Multi-modal, Multi-temporal, and Multi-scale Fusion Network), that integrates multi-temporal Sentinel-2 (S2) imagery and single photon lidar (SPL) data to estimate TSC for nine common species across the 630,000-hectare Romeo Malette Forest in Ontario, Canada. A dual-level alignment strategy combines (i) superpixel-based spatial aggregation to reconcile mismatched resolutions between high-resolution SPL point clouds (>25 pts/m2) and coarser S2 imagery (20 m), and (ii) a grid-based feature alignment that transforms unordered 3D point cloud features into structured 2D representations, enabling seamless integration of spectral and structural information. Within this aligned space, a multi-level Mamba-Fusion module jointly models multi-scale spatial patterns and seasonal dynamics through selective state-space modelling, efficiently capturing long-range dependencies while filtering redundant information. The framework achieves an R2 score of 0.676, outperforming existing point cloud-based methods by 6% in TSC estimation. For leading species classification, our results are 6% better in terms of weighted F1, using either the TSC-based method or the standalone leading species classification method. Addition of seasonal S2 imagery added a 10% R2 gain compared to the SPL-only mode. These results underscore the potential of fusing multi-modal and multi-temporal data with deep learning for scalable, high-accurate TSC estimation, offering a robust tool for large-scale management applications.
{"title":"M3FNet: Multi-modal multi-temporal multi-scale data fusion network for tree species composition mapping","authors":"Yuwei Cao , Nicholas C. Coops , Brent A. Murray , Ian Sinclair , Robere-McGugan Geordie","doi":"10.1016/j.isprsjprs.2025.11.026","DOIUrl":"10.1016/j.isprsjprs.2025.11.026","url":null,"abstract":"<div><div>Accurate estimation and mapping of <strong>t</strong>ree <strong>s</strong>pecies <strong>c</strong>omposition (TSC) is crucial for sustainable forest management. Recent advances in Light Detection and Ranging (lidar) technology and the availability of moderate spatial resolution, surface reflectance time series passive optical imagery offer scalable and efficient approaches for automated TSC estimation. In this research we develop a novel deep learning framework, M3F-Net (Multi-modal, Multi-temporal, and Multi-scale Fusion Network), that integrates multi-temporal Sentinel-2 (S2) imagery and single photon lidar (SPL) data to estimate TSC for nine common species across the 630,000-hectare Romeo Malette Forest in Ontario, Canada. A dual-level alignment strategy combines (i) superpixel-based spatial aggregation to reconcile mismatched resolutions between high-resolution SPL point clouds (>25 pts/m<sup>2</sup>) and coarser S2 imagery (20 m), and (ii) a grid-based feature alignment that transforms unordered 3D point cloud features into structured 2D representations, enabling seamless integration of spectral and structural information. Within this aligned space, a multi-level Mamba-Fusion module jointly models multi-scale spatial patterns and seasonal dynamics through selective state-space modelling, efficiently capturing long-range dependencies while filtering redundant information. The framework achieves an R<sup>2</sup> score of 0.676, outperforming existing point cloud-based methods by 6% in TSC estimation. For leading species classification, our results are 6% better in terms of weighted F1, using either the TSC-based method or the standalone leading species classification method. Addition of seasonal S2 imagery added a 10% R<sup>2</sup> gain compared to the SPL-only mode. These results underscore the potential of fusing multi-modal and multi-temporal data with deep learning for scalable, high-accurate TSC estimation, offering a robust tool for large-scale management applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 797-814"},"PeriodicalIF":12.2,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145613693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.isprsjprs.2025.11.003
Ole Wegen , Willy Scheibel , Rico Richter , Jürgen Döllner
Multi-temporal point clouds, which capture the same acquisition area at different points in time, enable change analysis and forecasting across various disciplines. Publicly available datasets play an important role in the development and evaluation of such approaches by enhancing comparability and reducing the effort required for data acquisition and preparation. However, identifying suitable datasets, assessing their characteristics, and comparing them with similar ones remains challenging and tedious due to the lack of a centralized distribution and documentation platform. In this paper, we provide a comprehensive overview of publicly available multi-temporal point cloud datasets. We evaluate each dataset across 30 different characteristics, grouped into six categories, and highlight current gaps and future challenges. Our analysis shows that, although many datasets are accompanied by extensive documentation, unclear usage terms and unreliable data hosting can limit their accessibility and adoption. In addition to clear correlations between application domains, acquisition methods, and captured scene types, there is also some overlap in point cloud requirements across domains. However, inconsistencies in file formats, data representations, and labeling practices hinder cross-domain and cross-application reuse. In the context of machine learning, we observe a positive trend towards more labeled datasets. Nevertheless, gaps remain due to limited coverage of natural environments and poor geographic diversity. Although there are already many positive examples of accessible datasets, future dataset publications would benefit from standardized review processes and a stronger focus on accessibility and usability across application areas.
{"title":"A survey of publicly available multi-temporal point cloud datasets","authors":"Ole Wegen , Willy Scheibel , Rico Richter , Jürgen Döllner","doi":"10.1016/j.isprsjprs.2025.11.003","DOIUrl":"10.1016/j.isprsjprs.2025.11.003","url":null,"abstract":"<div><div>Multi-temporal point clouds, which capture the same acquisition area at different points in time, enable change analysis and forecasting across various disciplines. Publicly available datasets play an important role in the development and evaluation of such approaches by enhancing comparability and reducing the effort required for data acquisition and preparation. However, identifying suitable datasets, assessing their characteristics, and comparing them with similar ones remains challenging and tedious due to the lack of a centralized distribution and documentation platform. In this paper, we provide a comprehensive overview of publicly available multi-temporal point cloud datasets. We evaluate each dataset across 30 different characteristics, grouped into six categories, and highlight current gaps and future challenges. Our analysis shows that, although many datasets are accompanied by extensive documentation, unclear usage terms and unreliable data hosting can limit their accessibility and adoption. In addition to clear correlations between application domains, acquisition methods, and captured scene types, there is also some overlap in point cloud requirements across domains. However, inconsistencies in file formats, data representations, and labeling practices hinder cross-domain and cross-application reuse. In the context of machine learning, we observe a positive trend towards more labeled datasets. Nevertheless, gaps remain due to limited coverage of natural environments and poor geographic diversity. Although there are already many positive examples of accessible datasets, future dataset publications would benefit from standardized review processes and a stronger focus on accessibility and usability across application areas.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 815-836"},"PeriodicalIF":12.2,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145613697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.isprsjprs.2025.11.023
Xiangdong Ma , Xu Zhan , Xiaoling Zhang , Yaping Wang , Jun Shi , Shunjun Wei , Tianjiao Zeng
Array synthetic aperture radar (array-SAR) is a popular radar imaging technique for 3D scene sensing, especially for urban area. Recently, deep learning imaging methods have achieved significant advancements, showing promise for large-scale spatial sensing. However current methods struggle with generalization because their imaging pipelines are static—key parameters are fixed after training—so performance degrades across varying noise levels, measurement models, and scene distributions—a critical gap that remains insufficiently addressed. We address this by recasting array-SAR imaging as a dynamic Markov decision process. And we introduce a state–sequence–decision framework: a sequence of state transitions, where each state triggers learnable actions determined by decision that adapt step size, regularization threshold, and stopping based on the evolving state. We have conducted extensive experiments across a wide range of noise conditions (0–10 dB), measurement models (from ground-based to airborne systems, with 10%–50% sampling ratios), and scene distributions in both near-field and far-field sensing scenarios. Across all these settings, the proposed method consistently outperforms representative baselines, achieving average gains of 5.1 dB in PSNR and 0.35 in SSIM, demonstrating strong robustness across diverse sensing environments.
{"title":"Beyond static imaging: A dynamic decision paradigm for robust array-SAR in diverse sensing scenarios","authors":"Xiangdong Ma , Xu Zhan , Xiaoling Zhang , Yaping Wang , Jun Shi , Shunjun Wei , Tianjiao Zeng","doi":"10.1016/j.isprsjprs.2025.11.023","DOIUrl":"10.1016/j.isprsjprs.2025.11.023","url":null,"abstract":"<div><div>Array synthetic aperture radar (array-SAR) is a popular radar imaging technique for 3D scene sensing, especially for urban area. Recently, deep learning imaging methods have achieved significant advancements, showing promise for large-scale spatial sensing. However current methods struggle with generalization because their imaging pipelines are static—key parameters are fixed after training—so performance degrades across varying noise levels, measurement models, and scene distributions—a critical gap that remains insufficiently addressed. We address this by recasting array-SAR imaging as a dynamic Markov decision process. And we introduce a state–sequence–decision framework: a sequence of state transitions, where each state triggers learnable actions determined by decision that adapt step size, regularization threshold, and stopping based on the evolving state. We have conducted extensive experiments across a wide range of noise conditions (0–10 dB), measurement models (from ground-based to airborne systems, with 10%–50% sampling ratios), and scene distributions in both near-field and far-field sensing scenarios. Across all these settings, the proposed method consistently outperforms representative baselines, achieving average gains of 5.1 dB in PSNR and 0.35 in SSIM, demonstrating strong robustness across diverse sensing environments.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 778-796"},"PeriodicalIF":12.2,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145611833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.isprsjprs.2025.11.017
Xiaonan Hu , Xuebing Li , Jinyu Xu , Abdulkadir Duran Adan , Letian Zhou , Xuhui Zhu , Yanan Li , Wei Guo , Shouyang Liu , Wenzhong Liu , Hao Lu
Accurate plant counting provides valuable information for agriculture such as crop yield prediction, plant density assessment, and phenotype quantification. Vision-based approaches are currently the mainstream solution. Prior art typically uses a detection or a regression model to count a specific plant. However, plants have biodiversity, and new cultivars are increasingly bred each year. It is almost impossible to exhaust and build all species-dependent counting models. Inspired by class-agnostic counting (CAC) in computer vision, we argue that it is time to rethink the problem formulation of plant counting, from what plants to count to how to count plants. In contrast to most daily objects with spatial and temporal invariance, plants are dynamic, changing with time and space. Their non-rigid structure often leads to worse performance than counting rigid instances like heads and cars such that current CAC and open-world detection models are suboptimal to count plants. In this work, we inherit the vein of the TasselNet plant counting model and introduce a new extension, TasselNetV4, shifting from species-specific counting to cross-species counting. TasselNetV4 marries the local counting idea of TasselNet with the extract-and-match paradigm in CAC. It builds upon a plain vision transformer and incorporates novel multi-branch box-aware local counters used to enhance cross-scale robustness. In particular, two challenging datasets, PAC-105 and PAC-Somalia, are harvested. PAC-105 features 105 plant- and organ-level categories from 64 plant species, spanning various scenes. PAC-Somalia, specific to out-of-distribution validation, features 32 unique plant species in Somalia. Extensive experiments against state-of-the-art CAC models show that TasselNetV4 achieves not only superior counting performance but also high efficiency, with a mean absolute error of 16.04, an of 0.92, and up to 121 FPS inference speed on images of 384 × 384 resolution. Our results indicate that TasselNetV4 emerges to be a vision foundation model for cross-scene, cross-scale, and cross-species plant counting. To facilitate future plant counting research, we plan to release all the data, annotations, code, and pretrained models at https://github.com/tiny-smart/tasselnetv4.
{"title":"TasselNetV4: A vision foundation model for cross-scene, cross-scale, and cross-species plant counting","authors":"Xiaonan Hu , Xuebing Li , Jinyu Xu , Abdulkadir Duran Adan , Letian Zhou , Xuhui Zhu , Yanan Li , Wei Guo , Shouyang Liu , Wenzhong Liu , Hao Lu","doi":"10.1016/j.isprsjprs.2025.11.017","DOIUrl":"10.1016/j.isprsjprs.2025.11.017","url":null,"abstract":"<div><div>Accurate plant counting provides valuable information for agriculture such as crop yield prediction, plant density assessment, and phenotype quantification. Vision-based approaches are currently the mainstream solution. Prior art typically uses a detection or a regression model to count a specific plant. However, plants have biodiversity, and new cultivars are increasingly bred each year. It is almost impossible to exhaust and build all species-dependent counting models. Inspired by class-agnostic counting (CAC) in computer vision, we argue that it is time to rethink the problem formulation of plant counting, from what plants to count to how to count plants. In contrast to most daily objects with spatial and temporal invariance, plants are dynamic, changing with time and space. Their non-rigid structure often leads to worse performance than counting rigid instances like heads and cars such that current CAC and open-world detection models are suboptimal to count plants. In this work, we inherit the vein of the TasselNet plant counting model and introduce a new extension, TasselNetV4, shifting from species-specific counting to cross-species counting. TasselNetV4 marries the local counting idea of TasselNet with the extract-and-match paradigm in CAC. It builds upon a plain vision transformer and incorporates novel multi-branch box-aware local counters used to enhance cross-scale robustness. In particular, two challenging datasets, PAC-105 and PAC-Somalia, are harvested. PAC-105 features 105 plant- and organ-level categories from 64 plant species, spanning various scenes. PAC-Somalia, specific to out-of-distribution validation, features 32 unique plant species in Somalia. Extensive experiments against state-of-the-art CAC models show that TasselNetV4 achieves not only superior counting performance but also high efficiency, with a mean absolute error of 16.04, an <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.92, and up to 121 FPS inference speed on images of 384 × 384 resolution. Our results indicate that TasselNetV4 emerges to be a vision foundation model for cross-scene, cross-scale, and cross-species plant counting. To facilitate future plant counting research, we plan to release all the data, annotations, code, and pretrained models at <span><span>https://github.com/tiny-smart/tasselnetv4</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 745-760"},"PeriodicalIF":12.2,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145609525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.isprsjprs.2025.11.022
F. Liebold, H.-G. Maas
The perspective projection of a sphere appears as an ellipse in the image where the ellipse center differs from the projection of the sphere’s center. This eccentricity leads to systematic errors in photogrammetric measurements. For a sphere of 40 mm diameter on a plate with 33 cm distance from the camera’s projection center, 15 cm distance to the nadir point and a principal distance of 12 mm, the eccentricity can reach more than 20 m in the image. The publication at hand deals with eccentricity correction terms that can be applied either to the measured image coordinates or through a model adaption. An overview of existing correction terms in image space is provided and a new extension of the pinhole camera model for spheres is proposed which also can be used simultaneously for the sphere parameter determination. Furthermore, estimation procedures for the initial values of the sphere radius as well as the principal distance and the principal point from the ellipse measurements in a single image are presented. In experiments, the proposed methods reduced the reprojection error by a factor of three and achieved a relative scale accuracy of 0.2% to 0.3% when using known radii.
{"title":"Spherical target eccentricity correction in photogrammetric applications","authors":"F. Liebold, H.-G. Maas","doi":"10.1016/j.isprsjprs.2025.11.022","DOIUrl":"10.1016/j.isprsjprs.2025.11.022","url":null,"abstract":"<div><div>The perspective projection of a sphere appears as an ellipse in the image where the ellipse center differs from the projection of the sphere’s center. This eccentricity leads to systematic errors in photogrammetric measurements. For a sphere of 40 mm diameter on a plate with 33 cm distance from the camera’s projection center, 15 cm distance to the nadir point and a principal distance of 12 mm, the eccentricity can reach more than 20<!--> <span><math><mi>μ</mi></math></span>m in the image. The publication at hand deals with eccentricity correction terms that can be applied either to the measured image coordinates or through a model adaption. An overview of existing correction terms in image space is provided and a new extension of the pinhole camera model for spheres is proposed which also can be used simultaneously for the sphere parameter determination. Furthermore, estimation procedures for the initial values of the sphere radius as well as the principal distance and the principal point from the ellipse measurements in a single image are presented. In experiments, the proposed methods reduced the reprojection error by a factor of three and achieved a relative scale accuracy of 0.2% to 0.3% when using known radii.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 761-777"},"PeriodicalIF":12.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.isprsjprs.2025.11.019
Bowen Li , Yong Pang , Daniel Kükenbrink , Luo Wang , Dan Kong , Mauro Marty
Recent advances in aerial Light Detection and Ranging (LiDAR) technologies have revolutionized the capability to characterize individual tree structure, enabling detailed ecological analyses at the tree level. A critical prerequisite for such analysis is an accurate individual tree segmentation. However, this task remains challenging due to the complexity of forest environments and varying quality of point clouds collected by diverse aerial sensors and platforms. Existing methods are mostly designed for a single aerial platform or sensor and struggle with complex forest environments. To address these limitations, we propose ITS-Net, an aerial platform and sensor-agnostic deep learning model for individual tree segmentation, which integrates three modules designed to enhance its learning capability under complex forest environments. To facilitate and evaluate its platform and sensor-agnostic capabilities, we constructed AerialTrees, a comprehensive individual tree segmentation dataset that included aerial LiDAR data collected with point densities ranging from 50 to 10,000 pts/m2 using different sensors from ALS and ULS platforms over four climate zones. This dataset also included 2,903 individual trees that had been labeled manually. ITS-Net outperformed state-of-the-art individual tree segmentation methods on AerialTrees, achieving the highest average performance with a detection rate of 94.8 % and an F1-score of 90.9 %. It achieved an F1-score of 88.1 % when tested on the publicly available FOR-instance dataset. ITS-Net also performed better than the state-of-the-art ForAINet method for multi-layered canopy segmentation, outperforming the latter by 12.3 % in detecting understory vegetation. When directly transferred to the five study sites of the FOR-instance dataset as well as the study sites in Switzerland and Russia, ITS-Net produced accuracies that were reasonably close to those produced by several other algorithms trained over those study sites. These results were achieved without requiring efforts to address differences in LiDAR data characteristics through explicit data preprocessing or to fine tune the parameters of the deep learning model, demonstrating ITS-Net’s robustness for segmenting various aerial LiDAR point clouds acquired using different sensors from different aerial platforms. As a sensor and platform agnostic method, ITS-Net may provide an end-to-end solution needed to facilitate the use of rapidly evolving aerial LiDAR technology in various forestry applications. The AerialTrees dataset developed through this study is a significant contribution to the very few publicly available labeled LiDAR datasets that are crucial for calibrating, testing, and benchmarking individual tree segmentation algorithms. Our code and data are available at: https://github.com/A8366233/AerialTrees.
{"title":"ITS-Net: A platform and sensor agnostic 3D deep learning model for individual tree segmentation using aerial LiDAR data","authors":"Bowen Li , Yong Pang , Daniel Kükenbrink , Luo Wang , Dan Kong , Mauro Marty","doi":"10.1016/j.isprsjprs.2025.11.019","DOIUrl":"10.1016/j.isprsjprs.2025.11.019","url":null,"abstract":"<div><div>Recent advances in aerial Light Detection and Ranging (LiDAR) technologies have revolutionized the capability to characterize individual tree structure, enabling detailed ecological analyses at the tree level. A critical prerequisite for such analysis is an accurate individual tree segmentation. However, this task remains challenging due to the complexity of forest environments and varying quality of point clouds collected by diverse aerial sensors and platforms. Existing methods are mostly designed for a single aerial platform or sensor and struggle with complex forest environments. To address these limitations, we propose ITS-Net, an aerial platform and sensor-agnostic deep learning model for individual tree segmentation, which integrates three modules designed to enhance its learning capability under complex forest environments. To facilitate and evaluate its platform and sensor-agnostic capabilities, we constructed AerialTrees, a comprehensive individual tree segmentation dataset that included aerial LiDAR data collected with point densities ranging from 50 to 10,000 pts/m<sup>2</sup> using different sensors from ALS and ULS platforms over four climate zones. This dataset also included 2,903 individual trees that had been labeled manually. ITS-Net outperformed state-of-the-art individual tree segmentation methods on AerialTrees, achieving the highest average performance with a detection rate of 94.8 % and an F1-score of 90.9 %. It achieved an F1-score of 88.1 % when tested on the publicly available FOR-instance dataset. ITS-Net also performed better than the state-of-the-art ForAINet method for multi-layered canopy segmentation, outperforming the latter by 12.3 % in detecting understory vegetation. When directly transferred to the five study sites of the FOR-instance dataset as well as the study sites in Switzerland and Russia, ITS-Net produced accuracies that were reasonably close to those produced by several other algorithms trained over those study sites. These results were achieved without requiring efforts to address differences in LiDAR data characteristics through explicit data preprocessing or to fine tune the parameters of the deep learning model, demonstrating ITS-Net’s robustness for segmenting various aerial LiDAR point clouds acquired using different sensors from different aerial platforms. As a sensor and platform agnostic method, ITS-Net may provide an end-to-end solution needed to facilitate the use of rapidly evolving aerial LiDAR technology in various forestry applications. The AerialTrees dataset developed through this study is a significant contribution to the very few publicly available labeled LiDAR datasets that are crucial for calibrating, testing, and benchmarking individual tree segmentation algorithms. Our code and data are available at: <span><span>https://github.com/A8366233/AerialTrees</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"231 ","pages":"Pages 719-744"},"PeriodicalIF":12.2,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}