Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.022
Feng Li , Xiaojing Yang , Liang Zhang , Yanhua Wang , Yuqi Han , Xin Zhang , Yang Li
In response to the challenges posed by the difficulty in obtaining polarimetric synthetic aperture radar (PolSAR) data for certain specific categories of targets, we present a zero-shot target recognition method for PolSAR images. Based on a generative model, the method leverages the unique characteristics of polarimetric SAR images and incorporates two key modules: the scattering characteristics-guided semantic embedding generation module (SE) and the polarization characteristics-guided distributional correction module (DC). The former ensures the stability of synthetic features for unseen classes by controlling scattering characteristics. At the same time, the latter enhances the quality of synthetic features by utilizing polarimetric features, thereby improving the accuracy of zero-shot recognition. The proposed method is evaluated on the GOTCHA dataset to assess its performance in recognizing unseen classes. The experiment results demonstrate that the proposed method achieves SOTA performance in zero-shot PolSAR target recognition (e.g., improving the recognition accuracy of unseen categories by nearly 20%). Our codes are available at https://github.com/chuyihuan/Zero-shot-PolSAR-target-recognition.
{"title":"Scattering mechanism-guided zero-shot PolSAR target recognition","authors":"Feng Li , Xiaojing Yang , Liang Zhang , Yanhua Wang , Yuqi Han , Xin Zhang , Yang Li","doi":"10.1016/j.isprsjprs.2024.12.022","DOIUrl":"10.1016/j.isprsjprs.2024.12.022","url":null,"abstract":"<div><div>In response to the challenges posed by the difficulty in obtaining polarimetric synthetic aperture radar (PolSAR) data for certain specific categories of targets, we present a zero-shot target recognition method for PolSAR images. Based on a generative model, the method leverages the unique characteristics of polarimetric SAR images and incorporates two key modules: the scattering characteristics-guided semantic embedding generation module (SE) and the polarization characteristics-guided distributional correction module (DC). The former ensures the stability of synthetic features for unseen classes by controlling scattering characteristics. At the same time, the latter enhances the quality of synthetic features by utilizing polarimetric features, thereby improving the accuracy of zero-shot recognition. The proposed method is evaluated on the GOTCHA dataset to assess its performance in recognizing unseen classes. The experiment results demonstrate that the proposed method achieves SOTA performance in zero-shot PolSAR target recognition (<em>e.g.,</em> improving the recognition accuracy of unseen categories by nearly 20%). Our codes are available at <span><span>https://github.com/chuyihuan/Zero-shot-PolSAR-target-recognition</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 428-439"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.013
Yongjing Mao, Kristen D. Splinter
Shorelines derived from optical satellite images are increasingly being used for regional to global scale analysis of sandy coastline dynamics. The optical satellite record, however, is contaminated by cloud cover, which can substantially reduce the temporal resolution of available images for shoreline analysis. Meanwhile, with the development of deep learning methods, optical images are increasingly fused with Synthetic Aperture Radar (SAR) images that are unaffected by clouds to reconstruct the cloud-contaminated pixels. Such SAR-Optical fusion methods have been shown successful for different land surface applications, but the unique characteristics of coastal areas make the applicability of this method unknown in these dynamic zones.
Herein we apply a deep internal learning (DIL) method to reconstruct cloud-contaminated optical images and explore its applicability to retrieve shorelines obscured by clouds. Our approach uses a mixed sequence of SAR and Gaussian noise images as the prior and the cloudy Modified Normalized Difference Water Index (MNDWI) as the target. The DIL encodes the target with priors and synthesizes plausible pixels under cloud cover. A unique aspect of our workflow is the inclusion of Gaussian noise in the prior sequence for MNDWI images when SAR images collected within a 1-day temporal lag are not available. A novel loss function of DIL model is also introduced to optimize the image reconstruction near the shoreline. These new developments have significant contribution to the model accuracy.
The DIL method is tested at four different sites with varying tide, wave, and shoreline dynamics. Shorelines derived from the reconstructed and true MNDWI images are compared to quantify the internal accuracy of shoreline reconstruction. For microtidal environments with mean springs tidal range less than 2 m, the mean absolute error (MAE) of shoreline reconstruction is less than 7.5 m with the coefficient of determination () more than 0.78 regardless of shoreline and wave dynamics. The method is less skilful in macro- and mesotidal environments due to the larger water level difference in the paired optical and SAR images, resulting in the MAE of 12.59 m and of 0.43. The proposed SAR-Optical fusion method demonstrates substantially better accuracy in retrieving cloud-obscured shoreline positions compared to interpolation methods relying solely on optical images. Results from our work highlight the great potential of SAR-Optical fusion to derive shorelines even under the cloudiest conditions, thus increasing the temporal resolution of shoreline datasets.
{"title":"Application of SAR-Optical fusion to extract shoreline position from Cloud-Contaminated satellite images","authors":"Yongjing Mao, Kristen D. Splinter","doi":"10.1016/j.isprsjprs.2025.01.013","DOIUrl":"10.1016/j.isprsjprs.2025.01.013","url":null,"abstract":"<div><div>Shorelines derived from optical satellite images are increasingly being used for regional to global scale analysis of sandy coastline dynamics. The optical satellite record, however, is contaminated by cloud cover, which can substantially reduce the temporal resolution of available images for shoreline analysis. Meanwhile, with the development of deep learning methods, optical images are increasingly fused with Synthetic Aperture Radar (SAR) images that are unaffected by clouds to reconstruct the cloud-contaminated pixels. Such SAR-Optical fusion methods have been shown successful for different land surface applications, but the unique characteristics of coastal areas make the applicability of this method unknown in these dynamic zones.</div><div>Herein we apply a deep internal learning (DIL) method to reconstruct cloud-contaminated optical images and explore its applicability to retrieve shorelines obscured by clouds. Our approach uses a mixed sequence of SAR and Gaussian noise images as the prior and the cloudy Modified Normalized Difference Water Index (MNDWI) as the target. The DIL encodes the target with priors and synthesizes plausible pixels under cloud cover. A unique aspect of our workflow is the inclusion of Gaussian noise in the prior sequence for MNDWI images when SAR images collected within a 1-day temporal lag are not available. A novel loss function of DIL model is also introduced to optimize the image reconstruction near the shoreline. These new developments have significant contribution to the model accuracy.</div><div>The DIL method is tested at four different sites with varying tide, wave, and shoreline dynamics. Shorelines derived from the reconstructed and true MNDWI images are compared to quantify the internal accuracy of shoreline reconstruction. For microtidal environments with mean springs tidal range less than 2 m, the mean absolute error (MAE) of shoreline reconstruction is less than 7.5 m with the coefficient of determination (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></mrow></math></span>) more than 0.78 regardless of shoreline and wave dynamics. The method is less skilful in macro- and mesotidal environments due to the larger water level difference in the paired optical and SAR images, resulting in the MAE of 12.59 m and <span><math><mrow><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></mrow></math></span> of 0.43. The proposed SAR-Optical fusion method demonstrates substantially better accuracy in retrieving cloud-obscured shoreline positions compared to interpolation methods relying solely on optical images. Results from our work highlight the great potential of SAR-Optical fusion to derive shorelines even under the cloudiest conditions, thus increasing the temporal resolution of shoreline datasets.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 563-579"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.010
Di Wang , Guorui Ma , Haiming Zhang , Xiao Wang , Yongxian Zhang
Heterogeneous Remote Sensing Images Change Detection (HRSICD) is a significant challenge in remote sensing image processing, with substantial application value in rapid natural disaster response. However, significant differences in imaging modalities often result in poor comparability of their features, affecting the recognition accuracy. To address the issue, we propose a novel HRSICD method based on image structure relationships and semantic information. First, we employ a Multi-scale Pyramid Convolution Encoder to efficiently extract the multi-scale and detailed features. Next, the Cross-domain Feature Alignment Module aligns the structural relationships and semantic features of the heterogeneous images, enhancing the comparability between heterogeneous image features. Finally, the Multi-level Decoder fuses the structural and semantic features, achieving refined identification of change areas. We validated the advancement of proposed method on five publicly available HRSICD datasets. Additionally, zero-shot generalization experiments and real-world applications were conducted to assess its generalization capability. Our method achieved favorable results in all experiments, demonstrating its effectiveness. The code of the proposed method will be made available at https://github.com/Lucky-DW/HRSICD.
{"title":"Refined change detection in heterogeneous low-resolution remote sensing images for disaster emergency response","authors":"Di Wang , Guorui Ma , Haiming Zhang , Xiao Wang , Yongxian Zhang","doi":"10.1016/j.isprsjprs.2024.12.010","DOIUrl":"10.1016/j.isprsjprs.2024.12.010","url":null,"abstract":"<div><div>Heterogeneous Remote Sensing Images Change Detection (HRSICD) is a significant challenge in remote sensing image processing, with substantial application value in rapid natural disaster response. However, significant differences in imaging modalities often result in poor comparability of their features, affecting the recognition accuracy. To address the issue, we propose a novel HRSICD method based on image structure relationships and semantic information. First, we employ a Multi-scale Pyramid Convolution Encoder to efficiently extract the multi-scale and detailed features. Next, the Cross-domain Feature Alignment Module aligns the structural relationships and semantic features of the heterogeneous images, enhancing the comparability between heterogeneous image features. Finally, the Multi-level Decoder fuses the structural and semantic features, achieving refined identification of change areas. We validated the advancement of proposed method on five publicly available HRSICD datasets. Additionally, zero-shot generalization experiments and real-world applications were conducted to assess its generalization capability. Our method achieved favorable results in all experiments, demonstrating its effectiveness. The code of the proposed method will be made available at <span><span>https://github.com/Lucky-DW/HRSICD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 139-155"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.003
Shaolong Wu , Chi Chen , Bisheng Yang , Zhengfei Yan , Zhiye Wang , Shangzhe Sun , Qin Zou , Jing Fu
As the power grid is an indispensable foundation of modern society, creating a digital twin of the grid is of great importance. Pylons serve as components in the transmission corridor, and their precise 3D reconstruction is essential for the safe operation of power grids. However, 3D pylon reconstruction from LiDAR point clouds presents numerous challenges due to data quality and the diversity and complexity of pylon structures. To address these challenges, we introduce PylonModeler: a hybrid-driven method for 3D pylon reconstruction using airborne LiDAR point clouds, thereby enabling accurate, robust, and efficient real-time pylon reconstruction. Different strategies are employed to achieve independent reconstructions and assemblies for various structures. We propose Pylon Former, a lightweight transformer network for real-time pylon recognition and decomposition. Subsequently, we apply a data-driven approach for the pylon body reconstruction. Considering structural characteristics, fitting and clustering algorithms are used to reconstruct both external and internal structures. The pylon head is reconstructed using a hybrid approach. A pre-built pylon head parameter model library defines different pylons by a series of parameters. The coherent point drift (CPD) algorithm is adopted to establish the topological relationships between pylon head structures and set initial model parameters, which are refined through optimization for accurate pylon head reconstruction. Finally, the pylon body and head models are combined to complete the reconstruction. We collected an airborne LiDAR dataset, which includes a total of 3398 pylon data across eight types. The dataset consists of transmission lines of various voltage levels, such as 110 kV, 220 kV, and 500 kV. PylonModeler is validated on this dataset. The average reconstruction time of a pylon is 1.10 s, with an average reconstruction accuracy of 0.216 m. In addition, we evaluate the performance of PylonModeler on public airborne LiDAR data from Luxembourg. Compared to previous state-of-the-art methods, reconstruction accuracy improved by approximately 26.28 %. With superior performance, PylonModeler is tens of times faster than the current model-driven methods, enabling real-time pylon reconstruction.
{"title":"PylonModeler: A hybrid-driven 3D reconstruction method for power transmission pylons from LiDAR point clouds","authors":"Shaolong Wu , Chi Chen , Bisheng Yang , Zhengfei Yan , Zhiye Wang , Shangzhe Sun , Qin Zou , Jing Fu","doi":"10.1016/j.isprsjprs.2024.12.003","DOIUrl":"10.1016/j.isprsjprs.2024.12.003","url":null,"abstract":"<div><div>As the power grid is an indispensable foundation of modern society, creating a digital twin of the grid is of great importance. Pylons serve as components in the transmission corridor, and their precise 3D reconstruction is essential for the safe operation of power grids. However, 3D pylon reconstruction from LiDAR point clouds presents numerous challenges due to data quality and the diversity and complexity of pylon structures. To address these challenges, we introduce PylonModeler: a hybrid-driven method for 3D pylon reconstruction using airborne LiDAR point clouds, thereby enabling accurate, robust, and efficient real-time pylon reconstruction. Different strategies are employed to achieve independent reconstructions and assemblies for various structures. We propose Pylon Former, a lightweight transformer network for real-time pylon recognition and decomposition. Subsequently, we apply a data-driven approach for the pylon body reconstruction. Considering structural characteristics, fitting and clustering algorithms are used to reconstruct both external and internal structures. The pylon head is reconstructed using a hybrid approach. A pre-built pylon head parameter model library defines different pylons by a series of parameters. The coherent point drift (CPD) algorithm is adopted to establish the topological relationships between pylon head structures and set initial model parameters, which are refined through optimization for accurate pylon head reconstruction. Finally, the pylon body and head models are combined to complete the reconstruction. We collected an airborne LiDAR dataset, which includes a total of 3398 pylon data across eight types. The dataset consists of transmission lines of various voltage levels, such as 110 kV, 220 kV, and 500 kV. PylonModeler is validated on this dataset. The average reconstruction time of a pylon is 1.10 s, with an average reconstruction accuracy of 0.216 m. In addition, we evaluate the performance of PylonModeler on public airborne LiDAR data from Luxembourg. Compared to previous state-of-the-art methods, reconstruction accuracy improved by approximately 26.28 %. With superior performance, PylonModeler is tens of times faster than the current model-driven methods, enabling real-time pylon reconstruction.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 100-124"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.006
Zhangfeng Ma , Nanxin Wang , Yingbao Yang , Yosuke Aoki , Shengji Wei
Multi-looking, aimed at reducing data size and improving the signal-to-noise ratio, is indispensable for large-scale InSAR data processing. However, the resulting “Fading Signal” caused by multi-looking breaks the phase consistency among triplet interferograms and introduces bias into the estimated displacements. This inconsistency challenges the assumption that only unwrapping errors are involved in triplet phase closure. Therefore, untangling phase unwrapping errors and fading signals from triplet phase closure is critical to achieving more precise InSAR measurements. To address this challenge, we propose a new method that mitigates phase unwrapping errors and fading signals. This new method consists of two key steps. The first step is triplet phase closure-based stacking, which allows for the direct estimation of fading signals in each interferogram. The second step is Basis Pursuit Denoising-based unwrapping error correction, which transforms unwrapping error correction into sparse signal recovery. Through these two procedures, the new method can be seamlessly integrated into the traditional InSAR workflow. Additionally, the estimated fading signal can be directly used to derive soil moisture as a by-product of our method. Experimental results on the San Francisco Bay area demonstrate that the new method reduces velocity estimation errors by approximately 9 %–19 %, effectively addressing phase unwrapping errors and fading signals. This performance outperforms both ILP and Lasso methods, which only account for unwrapping errors in the triplet closure. Additionally, the derived by-product, soil moisture, shows strong consistency with most external soil moisture products.
{"title":"Unwrapping error and fading signal correction on multi-looked InSAR data","authors":"Zhangfeng Ma , Nanxin Wang , Yingbao Yang , Yosuke Aoki , Shengji Wei","doi":"10.1016/j.isprsjprs.2024.12.006","DOIUrl":"10.1016/j.isprsjprs.2024.12.006","url":null,"abstract":"<div><div>Multi-looking, aimed at reducing data size and improving the signal-to-noise ratio, is indispensable for large-scale InSAR data processing. However, the resulting “Fading Signal” caused by multi-looking breaks the phase consistency among triplet interferograms and introduces bias into the estimated displacements. This inconsistency challenges the assumption that only unwrapping errors are involved in triplet phase closure. Therefore, untangling phase unwrapping errors and fading signals from triplet phase closure is critical to achieving more precise InSAR measurements. To address this challenge, we propose a new method that mitigates phase unwrapping errors and fading signals. This new method consists of two key steps. The first step is triplet phase closure-based stacking, which allows for the direct estimation of fading signals in each interferogram. The second step is Basis Pursuit Denoising-based unwrapping error correction, which transforms unwrapping error correction into sparse signal recovery. Through these two procedures, the new method can be seamlessly integrated into the traditional InSAR workflow. Additionally, the estimated fading signal can be directly used to derive soil moisture as a by-product of our method. Experimental results on the San Francisco Bay area demonstrate that the new method reduces velocity estimation errors by approximately 9 %–19 %, effectively addressing phase unwrapping errors and fading signals. This performance outperforms both ILP and Lasso methods, which only account for unwrapping errors in the triplet closure. Additionally, the derived by-product, soil moisture, shows strong consistency with most external soil moisture products.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 51-63"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.11.020
Yi Li , Guangjian Yan , Weihua Li , Donghui Xie , Hailan Jiang , Linyuan Li , Jianbo Qi , Ronghai Hu , Xihan Mu , Xiao Chen , Shanshan Wei , Hao Tang
Spaceborne light detection and ranging (LiDAR) waveform sensors require accurate signal simulations to facilitate prelaunch calibration, postlaunch validation, and the development of land surface data products. However, accurately simulating spaceborne LiDAR waveforms over heterogeneous forests remains challenging because data-driven methods do not account for complicated pulse transport within heterogeneous canopies, whereas analytical radiative transfer models overly rely on assumptions about canopy structure and distribution. Thus, a comprehensive simulation method is needed to account for both the complexity of pulse transport within canopies and the structural heterogeneity of forests. In this study, we propose a framework for spaceborne LiDAR waveform simulation by integrating a new radiative transfer model – the canopy voxel radiative transfer (CVRT) model – with reconstructed three-dimensional (3D) voxel forest scenes from small-footprint airborne LiDAR (ALS) point clouds. The CVRT model describes the radiative transfer process within canopy voxels and uses fractional crown cover to account for within-voxel heterogeneity, minimizing the need for assumptions about canopy shape and distribution and significantly reducing the number of input parameters. All the parameters for scene construction and model inputs can be obtained from the ALS point clouds. The performance of the proposed framework was assessed by comparing the results to the simulated LiDAR waveforms from DART, Global Ecosystem Dynamics Investigation (GEDI) data over heterogeneous forest stands, and Land, Vegetation, and Ice Sensor (LVIS) data from the National Ecological Observatory Network (NEON) site. The results suggest that compared with existing models, the new framework with the CVRT model achieved improved agreement with both simulated and measured data, with an average R2 improvement of approximately 2% to 5% and an average RMSE reduction of approximately 0.5% to 3%. The proposed framework was also highly adaptive and robust to variations in model configurations, input data quality, and environmental attributes. In summary, this work extends current research on accurate and robust large-footprint LiDAR waveform simulations over heterogeneous forest canopies and could help refine product development for emerging spaceborne LiDAR missions.
{"title":"Accurate spaceborne waveform simulation in heterogeneous forests using small-footprint airborne LiDAR point clouds","authors":"Yi Li , Guangjian Yan , Weihua Li , Donghui Xie , Hailan Jiang , Linyuan Li , Jianbo Qi , Ronghai Hu , Xihan Mu , Xiao Chen , Shanshan Wei , Hao Tang","doi":"10.1016/j.isprsjprs.2024.11.020","DOIUrl":"10.1016/j.isprsjprs.2024.11.020","url":null,"abstract":"<div><div>Spaceborne light detection and ranging (LiDAR) waveform sensors require accurate signal simulations to facilitate prelaunch calibration, postlaunch validation, and the development of land surface data products. However, accurately simulating spaceborne LiDAR waveforms over heterogeneous forests remains challenging because data-driven methods do not account for complicated pulse transport within heterogeneous canopies, whereas analytical radiative transfer models overly rely on assumptions about canopy structure and distribution. Thus, a comprehensive simulation method is needed to account for both the complexity of pulse transport within canopies and the structural heterogeneity of forests. In this study, we propose a framework for spaceborne LiDAR waveform simulation by integrating a new radiative transfer model – the canopy voxel radiative transfer (CVRT) model – with reconstructed three-dimensional (3D) voxel forest scenes from small-footprint airborne LiDAR (ALS) point clouds. The CVRT model describes the radiative transfer process within canopy voxels and uses fractional crown cover to account for within-voxel heterogeneity, minimizing the need for assumptions about canopy shape and distribution and significantly reducing the number of input parameters. All the parameters for scene construction and model inputs can be obtained from the ALS point clouds. The performance of the proposed framework was assessed by comparing the results to the simulated LiDAR waveforms from DART, Global Ecosystem Dynamics Investigation (GEDI) data over heterogeneous forest stands, and Land, Vegetation, and Ice Sensor (LVIS) data from the National Ecological Observatory Network (NEON) site. The results suggest that compared with existing models, the new framework with the CVRT model achieved improved agreement with both simulated and measured data, with an average R<sup>2</sup> improvement of approximately 2% to 5% and an average RMSE reduction of approximately 0.5% to 3%. The proposed framework was also highly adaptive and robust to variations in model configurations, input data quality, and environmental attributes. In summary, this work extends current research on accurate and robust large-footprint LiDAR waveform simulations over heterogeneous forest canopies and could help refine product development for emerging spaceborne LiDAR missions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 246-263"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.002
Huanyu Li , Hao Wang , Ying Zhang , Li Li , Peng Ren
We delve into the nascent field of underwater image captioning from three perspectives: challenges, models, and datasets. One challenge arises from the disparities between natural images and underwater images, which hinder the use of the former to train models for the latter. Another challenge exists in the limited feature extraction capabilities of current image captioning models, impeding the generation of accurate underwater image captions. The final challenge, albeit not the least significant, revolves around the insufficiency of data available for underwater image captioning. This insufficiency not only complicates the training of models but also poses challenges for evaluating their performance effectively. To address these challenges, we make three novel contributions. First, we employ a physics-based degradation technique to transform natural images into degraded images that closely resemble realistic underwater images. Based on the degraded images, we develop a meta-learning strategy specifically tailored for underwater tasks. Second, we develop an underwater image captioning model based on scene-object feature fusion. It fuses underwater scene features extracted by ResNeXt and object features localized by YOLOv8, yielding comprehensive features for underwater image captioning. Last but not least, we construct an underwater image captioning dataset covering various underwater scenes, with each underwater image annotated with five accurate captions for the purpose of comprehensive training and validation. Experimental results on the new dataset validate the effectiveness of our novel models. The code and datasets are released at https://gitee.com/LHY-CODE/UICM-SOFF.
{"title":"Underwater image captioning: Challenges, models, and datasets","authors":"Huanyu Li , Hao Wang , Ying Zhang , Li Li , Peng Ren","doi":"10.1016/j.isprsjprs.2024.12.002","DOIUrl":"10.1016/j.isprsjprs.2024.12.002","url":null,"abstract":"<div><div>We delve into the nascent field of underwater image captioning from three perspectives: challenges, models, and datasets. One challenge arises from the disparities between natural images and underwater images, which hinder the use of the former to train models for the latter. Another challenge exists in the limited feature extraction capabilities of current image captioning models, impeding the generation of accurate underwater image captions. The final challenge, albeit not the least significant, revolves around the insufficiency of data available for underwater image captioning. This insufficiency not only complicates the training of models but also poses challenges for evaluating their performance effectively. To address these challenges, we make three novel contributions. First, we employ a physics-based degradation technique to transform natural images into degraded images that closely resemble realistic underwater images. Based on the degraded images, we develop a meta-learning strategy specifically tailored for underwater tasks. Second, we develop an underwater image captioning model based on scene-object feature fusion. It fuses underwater scene features extracted by ResNeXt and object features localized by YOLOv8, yielding comprehensive features for underwater image captioning. Last but not least, we construct an underwater image captioning dataset covering various underwater scenes, with each underwater image annotated with five accurate captions for the purpose of comprehensive training and validation. Experimental results on the new dataset validate the effectiveness of our novel models. The code and datasets are released at <span><span>https://gitee.com/LHY-CODE/UICM-SOFF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 440-453"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.011
Ke Wang , Yue Wu , Xiaolan Qiu , Jinbiao Zhu , Donghai Zheng , Songtao Shangguan , Jie Pan , Yuquan Liu , Liming Jiang , Xin Li
High-altitude mountain glaciers are highly responsive to environmental changes. However, their remote locations limit the applicability of traditional mapping methods, such as probing and Ground Penetrating Radar (GPR), in tracking changes in ice thickness and glacier volume. Over the past two decades, airborne Tomographic Synthetic Aperture Radar (TomoSAR) has shown promise for mapping the internal structures of mountain glaciers. Yet, its 3D mapping capabilities are limited by the radar signal’s relatively shallow penetration depth, with bedrock echoes rarely detected beyond 60 meters. Additionally, most TomoSAR studies ignored the air-ice refraction during the image-focusing step, reducing the 3D focusing accuracy for deeper subsurface targets. In this study, we developed a novel algorithm that integrates refraction path calculations into SAR image focusing. We also introduced a new method to construct the 3D TomoSAR cube by stacking InSAR phase coherence images, enabling the retrieval of deep bedrock signals even at low signal-to-noise ratios.
We tested our algorithms on 14 P-band SAR images acquired on April 8, 2023, over Bayi Glacier in the Qilian Mountains, located on the Qinghai-Tibet Plateau. For the first time, we successfully mapped the ice thickness across an entire mountain glacier using the airborne TomoSAR technique, detecting bedrock signals at depths reaching up to 120 m. Our ice thickness estimates showed strong agreement with in situ measurements from three GPR transects totaling 3.8 km in length, with root-mean-square errors (RMSE) ranging from 3.18 to 4.66 m. For comparison, we applied the state-of-the-art 3D focusing algorithm used in the AlpTomoSAR campaign for ice thickness estimation, which resulted in RMSE values between 5.67 and 5.81 m. Our proposed method reduced the RMSE by 18% to 44% relative to the AlpTomoSAR algorithm. Based on these measurements, we calculated a total ice volume of 0.121 km, reflecting a decline of approximately 20.92% since the last reported volume in 2009, which was estimated from sparse GPR data. These results demonstrate that the proposed algorithm can effectively map ice thickness, providing a cost-efficient solution for large-scale glacier surveys in high-mountain regions.
{"title":"A novel airborne TomoSAR 3-D focusing method for accurate ice thickness and glacier volume estimation","authors":"Ke Wang , Yue Wu , Xiaolan Qiu , Jinbiao Zhu , Donghai Zheng , Songtao Shangguan , Jie Pan , Yuquan Liu , Liming Jiang , Xin Li","doi":"10.1016/j.isprsjprs.2025.01.011","DOIUrl":"10.1016/j.isprsjprs.2025.01.011","url":null,"abstract":"<div><div>High-altitude mountain glaciers are highly responsive to environmental changes. However, their remote locations limit the applicability of traditional mapping methods, such as probing and Ground Penetrating Radar (GPR), in tracking changes in ice thickness and glacier volume. Over the past two decades, airborne Tomographic Synthetic Aperture Radar (TomoSAR) has shown promise for mapping the internal structures of mountain glaciers. Yet, its 3D mapping capabilities are limited by the radar signal’s relatively shallow penetration depth, with bedrock echoes rarely detected beyond 60 meters. Additionally, most TomoSAR studies ignored the air-ice refraction during the image-focusing step, reducing the 3D focusing accuracy for deeper subsurface targets. In this study, we developed a novel algorithm that integrates refraction path calculations into SAR image focusing. We also introduced a new method to construct the 3D TomoSAR cube by stacking InSAR phase coherence images, enabling the retrieval of deep bedrock signals even at low signal-to-noise ratios.</div><div>We tested our algorithms on 14 P-band SAR images acquired on April 8, 2023, over Bayi Glacier in the Qilian Mountains, located on the Qinghai-Tibet Plateau. For the first time, we successfully mapped the ice thickness across an entire mountain glacier using the airborne TomoSAR technique, detecting bedrock signals at depths reaching up to 120 m. Our ice thickness estimates showed strong agreement with in situ measurements from three GPR transects totaling 3.8 km in length, with root-mean-square errors (RMSE) ranging from 3.18 to 4.66 m. For comparison, we applied the state-of-the-art 3D focusing algorithm used in the AlpTomoSAR campaign for ice thickness estimation, which resulted in RMSE values between 5.67 and 5.81 m. Our proposed method reduced the RMSE by 18% to 44% relative to the AlpTomoSAR algorithm. Based on these measurements, we calculated a total ice volume of 0.121 km<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>, reflecting a decline of approximately 20.92% since the last reported volume in 2009, which was estimated from sparse GPR data. These results demonstrate that the proposed algorithm can effectively map ice thickness, providing a cost-efficient solution for large-scale glacier surveys in high-mountain regions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 593-607"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.022
Shi Yi , Mengting Chen , Xuesong Yuan , Si Guo , Jiashuai Wang
<div><div>Investigating the distribution of ground surface hot spring fluids is crucial for the exploitation and utilization of geothermal resources. The detailed information provided by dual-spectrum images captured by unmanned aerial vehicles (UAVs) flew at low altitudes is beneficial to accurately segment ground surface hot spring fluids. However, existing image segmentation methods face significant challenges of hot spring fluids segmentation due to the frequent and irregular variations in fluid boundaries, meanwhile the presence of substances within such fluids lead to segmentation uncertainties. In addition, there is currently no benchmark dataset dedicated to ground surface hot spring fluid segmentation in dual-spectrum UAV images. To this end, in this study, a benchmark dataset called the dual-spectrum hot spring fluid segmentation (DHFS) dataset was constructed for segmenting ground surface hot spring fluids in dual-spectrum UAV images. Additionally, a novel interactive fusion attention-guided RGB-Thermal (RGB-T) semantic segmentation network named IFAGNet was proposed in this study for accurately segmenting ground surface hot spring fluids in dual-spectrum UAV images. The proposed IFAGNet consists of two sub-networks that leverage two feature fusion architectures and the two-stage feature fusion module is designed to achieve optimal intermediate feature fusion. Furthermore, IFAGNet utilizes an interactive fusion attention-guided architecture to guide the two sub-networks further process the extracted features through complementary information exchange, resulting in a significant boost in hot spring fluid segmentation accuracy. Additionally, two down-up full scale feature pyramid network (FPN) decoders are developed for each sub-network to fully utilize multi-stage fused features and improve the preservation of detailed information during hot spring fluid segmentation. Moreover, a hybrid consistency learning strategy is implemented to train the IFAGNet, which combines fully supervised learning with consistency learning between each sub-network and their fusion results to further optimize the segmentation accuracy of hot spring fluid in RGB-T UAV images. The optimal model of the IFAGNet was tested on the proposed DHFS dataset, and the experimental results demonstrated that the IFAGNet outperforms existing image segmentation frameworks in terms of segmentation accuracy for hot spring fluids segmentation in dual-spectrum UAV images which achieved Pixel Accuracy (PA) of 96.1%, Precision of 93.2%, Recall of 85.9%, Intersection over Union (IoU) of 78.3%, and F1-score (F1) of 89.4%, respectively. And overcomes segmentation uncertainties to a great extent, while maintaining competitive computational efficiency. The ablation studies have confirmed the effectiveness of each main innovation in IFAGNet for improving the accuracy of hot spring fluid segmentation. Therefore, the proposed DHFS dataset and IFAGNet lay the foundation for segmentation of
{"title":"An interactive fusion attention-guided network for ground surface hot spring fluids segmentation in dual-spectrum UAV images","authors":"Shi Yi , Mengting Chen , Xuesong Yuan , Si Guo , Jiashuai Wang","doi":"10.1016/j.isprsjprs.2025.01.022","DOIUrl":"10.1016/j.isprsjprs.2025.01.022","url":null,"abstract":"<div><div>Investigating the distribution of ground surface hot spring fluids is crucial for the exploitation and utilization of geothermal resources. The detailed information provided by dual-spectrum images captured by unmanned aerial vehicles (UAVs) flew at low altitudes is beneficial to accurately segment ground surface hot spring fluids. However, existing image segmentation methods face significant challenges of hot spring fluids segmentation due to the frequent and irregular variations in fluid boundaries, meanwhile the presence of substances within such fluids lead to segmentation uncertainties. In addition, there is currently no benchmark dataset dedicated to ground surface hot spring fluid segmentation in dual-spectrum UAV images. To this end, in this study, a benchmark dataset called the dual-spectrum hot spring fluid segmentation (DHFS) dataset was constructed for segmenting ground surface hot spring fluids in dual-spectrum UAV images. Additionally, a novel interactive fusion attention-guided RGB-Thermal (RGB-T) semantic segmentation network named IFAGNet was proposed in this study for accurately segmenting ground surface hot spring fluids in dual-spectrum UAV images. The proposed IFAGNet consists of two sub-networks that leverage two feature fusion architectures and the two-stage feature fusion module is designed to achieve optimal intermediate feature fusion. Furthermore, IFAGNet utilizes an interactive fusion attention-guided architecture to guide the two sub-networks further process the extracted features through complementary information exchange, resulting in a significant boost in hot spring fluid segmentation accuracy. Additionally, two down-up full scale feature pyramid network (FPN) decoders are developed for each sub-network to fully utilize multi-stage fused features and improve the preservation of detailed information during hot spring fluid segmentation. Moreover, a hybrid consistency learning strategy is implemented to train the IFAGNet, which combines fully supervised learning with consistency learning between each sub-network and their fusion results to further optimize the segmentation accuracy of hot spring fluid in RGB-T UAV images. The optimal model of the IFAGNet was tested on the proposed DHFS dataset, and the experimental results demonstrated that the IFAGNet outperforms existing image segmentation frameworks in terms of segmentation accuracy for hot spring fluids segmentation in dual-spectrum UAV images which achieved Pixel Accuracy (PA) of 96.1%, Precision of 93.2%, Recall of 85.9%, Intersection over Union (IoU) of 78.3%, and F1-score (F1) of 89.4%, respectively. And overcomes segmentation uncertainties to a great extent, while maintaining competitive computational efficiency. The ablation studies have confirmed the effectiveness of each main innovation in IFAGNet for improving the accuracy of hot spring fluid segmentation. Therefore, the proposed DHFS dataset and IFAGNet lay the foundation for segmentation of ","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 661-691"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.007
Zhenghui Zhao , Chen Wu , Lixiang Ru , Di Wang , Hongruixuan Chen , Cuiqun Chen
Change Detection (CD) focuses on identifying specific pixel-level landscape changes in multi-temporal remote sensing images. The process of obtaining pixel-level annotations for CD is generally both time-consuming and labor-intensive. Faced with this annotation challenge, there has been a growing interest in research on Weakly-Supervised Change Detection (WSCD). WSCD aims to detect pixel-level changes using only scene-level (i.e., image-level) change labels, thereby offering a more cost-effective approach. Despite considerable efforts to precisely locate changed regions, existing WSCD methods often encounter the problem of “instance lumping” under scene-level supervision, particularly in scenarios with a dense distribution of changed instances (i.e., changed objects). In these scenarios, unchanged pixels between changed instances are also mistakenly identified as changed, causing multiple changes to be mistakenly viewed as one. In practical applications, this issue prevents the accurate quantification of the number of changes. To address this issue, we propose a Dense Instance Separation (DISep) method as a plug-and-play solution, refining pixel features from a unified instance perspective under scene-level supervision. Specifically, our DISep comprises a three-step iterative training process: (1) Instance Localization: We locate instance candidate regions for changed pixels using high-pass class activation maps. (2) Instance Retrieval: We identify and group these changed pixels into different instance IDs through connectivity searching. Then, based on the assigned instance IDs, we extract corresponding pixel-level features on a per-instance basis. (3) Instance Separation: We introduce a separation loss to enforce intra-instance pixel consistency in the embedding space, thereby ensuring separable instance feature representations. The proposed DISep adds only minimal training cost and no inference cost. It can be seamlessly integrated to enhance existing WSCD methods. We achieve state-of-the-art performance by enhancing three Transformer-based and four ConvNet-based methods on the LEVIR-CD, WHU-CD, DSIFN-CD, SYSU-CD, and CDD datasets. Additionally, our DISep can be used to improve fully-supervised change detection methods. Code is available at https://github.com/zhenghuizhao/Plug-and-Play-DISep-for-Change-Detection.
{"title":"Plug-and-play DISep: Separating dense instances for scene-to-pixel weakly-supervised change detection in high-resolution remote sensing images","authors":"Zhenghui Zhao , Chen Wu , Lixiang Ru , Di Wang , Hongruixuan Chen , Cuiqun Chen","doi":"10.1016/j.isprsjprs.2025.01.007","DOIUrl":"10.1016/j.isprsjprs.2025.01.007","url":null,"abstract":"<div><div>Change Detection (CD) focuses on identifying specific pixel-level landscape changes in multi-temporal remote sensing images. The process of obtaining pixel-level annotations for CD is generally both time-consuming and labor-intensive. Faced with this annotation challenge, there has been a growing interest in research on Weakly-Supervised Change Detection (WSCD). WSCD aims to detect pixel-level changes using only scene-level (i.e., image-level) change labels, thereby offering a more cost-effective approach. Despite considerable efforts to precisely locate changed regions, existing WSCD methods often encounter the problem of “instance lumping” under scene-level supervision, particularly in scenarios with a dense distribution of changed instances (i.e., changed objects). In these scenarios, unchanged pixels between changed instances are also mistakenly identified as changed, causing multiple changes to be mistakenly viewed as one. In practical applications, this issue prevents the accurate quantification of the number of changes. To address this issue, we propose a Dense Instance Separation (DISep) method as a plug-and-play solution, refining pixel features from a unified instance perspective under scene-level supervision. Specifically, our DISep comprises a three-step iterative training process: (1) Instance Localization: We locate instance candidate regions for changed pixels using high-pass class activation maps. (2) Instance Retrieval: We identify and group these changed pixels into different instance IDs through connectivity searching. Then, based on the assigned instance IDs, we extract corresponding pixel-level features on a per-instance basis. (3) Instance Separation: We introduce a separation loss to enforce intra-instance pixel consistency in the embedding space, thereby ensuring separable instance feature representations. The proposed DISep adds only minimal training cost and no inference cost. It can be seamlessly integrated to enhance existing WSCD methods. We achieve state-of-the-art performance by enhancing three Transformer-based and four ConvNet-based methods on the LEVIR-CD, WHU-CD, DSIFN-CD, SYSU-CD, and CDD datasets. Additionally, our DISep can be used to improve fully-supervised change detection methods. Code is available at <span><span>https://github.com/zhenghuizhao/Plug-and-Play-DISep-for-Change-Detection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 770-782"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}