ISPRS Journal of Photogrammetry and Remote Sensing最新文献_第5页

Scattering mechanism-guided zero-shot PolSAR target recognition 散射机构制导的零弹PolSAR目标识别

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.022

Feng Li , Xiaojing Yang , Liang Zhang , Yanhua Wang , Yuqi Han , Xin Zhang , Yang Li

In response to the challenges posed by the difficulty in obtaining polarimetric synthetic aperture radar (PolSAR) data for certain specific categories of targets, we present a zero-shot target recognition method for PolSAR images. Based on a generative model, the method leverages the unique characteristics of polarimetric SAR images and incorporates two key modules: the scattering characteristics-guided semantic embedding generation module (SE) and the polarization characteristics-guided distributional correction module (DC). The former ensures the stability of synthetic features for unseen classes by controlling scattering characteristics. At the same time, the latter enhances the quality of synthetic features by utilizing polarimetric features, thereby improving the accuracy of zero-shot recognition. The proposed method is evaluated on the GOTCHA dataset to assess its performance in recognizing unseen classes. The experiment results demonstrate that the proposed method achieves SOTA performance in zero-shot PolSAR target recognition (e.g., improving the recognition accuracy of unseen categories by nearly 20%). Our codes are available at https://github.com/chuyihuan/Zero-shot-PolSAR-target-recognition.

针对极化合成孔径雷达（PolSAR）数据难以获取特定类别目标的问题，提出了一种针对PolSAR图像的零射击目标识别方法。该方法基于生成模型，利用极化SAR图像的独特特性，结合散射特征引导的语义嵌入生成模块（SE）和极化特征引导的分布校正模块（DC）两个关键模块。前者通过控制散射特性来保证不可见类合成特征的稳定性。同时，后者利用偏振特征增强合成特征的质量，从而提高零弹识别的精度。在GOTCHA数据集上对该方法进行了评估，以评估其识别未见类的性能。实验结果表明，该方法在零射击PolSAR目标识别中达到了SOTA性能（例如，未见类别的识别精度提高了近20%）。我们的代码可在https://github.com/chuyihuan/Zero-shot-PolSAR-target-recognition上获得。

{"title":"Scattering mechanism-guided zero-shot PolSAR target recognition","authors":"Feng Li , Xiaojing Yang , Liang Zhang , Yanhua Wang , Yuqi Han , Xin Zhang , Yang Li","doi":"10.1016/j.isprsjprs.2024.12.022","DOIUrl":"10.1016/j.isprsjprs.2024.12.022","url":null,"abstract":"<div><div>In response to the challenges posed by the difficulty in obtaining polarimetric synthetic aperture radar (PolSAR) data for certain specific categories of targets, we present a zero-shot target recognition method for PolSAR images. Based on a generative model, the method leverages the unique characteristics of polarimetric SAR images and incorporates two key modules: the scattering characteristics-guided semantic embedding generation module (SE) and the polarization characteristics-guided distributional correction module (DC). The former ensures the stability of synthetic features for unseen classes by controlling scattering characteristics. At the same time, the latter enhances the quality of synthetic features by utilizing polarimetric features, thereby improving the accuracy of zero-shot recognition. The proposed method is evaluated on the GOTCHA dataset to assess its performance in recognizing unseen classes. The experiment results demonstrate that the proposed method achieves SOTA performance in zero-shot PolSAR target recognition (<em>e.g.,</em> improving the recognition accuracy of unseen categories by nearly 20%). Our codes are available at <span><span>https://github.com/chuyihuan/Zero-shot-PolSAR-target-recognition</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 428-439"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of SAR-Optical fusion to extract shoreline position from Cloud-Contaminated satellite images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.013

Yongjing Mao, Kristen D. Splinter

Shorelines derived from optical satellite images are increasingly being used for regional to global scale analysis of sandy coastline dynamics. The optical satellite record, however, is contaminated by cloud cover, which can substantially reduce the temporal resolution of available images for shoreline analysis. Meanwhile, with the development of deep learning methods, optical images are increasingly fused with Synthetic Aperture Radar (SAR) images that are unaffected by clouds to reconstruct the cloud-contaminated pixels. Such SAR-Optical fusion methods have been shown successful for different land surface applications, but the unique characteristics of coastal areas make the applicability of this method unknown in these dynamic zones.

Herein we apply a deep internal learning (DIL) method to reconstruct cloud-contaminated optical images and explore its applicability to retrieve shorelines obscured by clouds. Our approach uses a mixed sequence of SAR and Gaussian noise images as the prior and the cloudy Modified Normalized Difference Water Index (MNDWI) as the target. The DIL encodes the target with priors and synthesizes plausible pixels under cloud cover. A unique aspect of our workflow is the inclusion of Gaussian noise in the prior sequence for MNDWI images when SAR images collected within a 1-day temporal lag are not available. A novel loss function of DIL model is also introduced to optimize the image reconstruction near the shoreline. These new developments have significant contribution to the model accuracy.

The DIL method is tested at four different sites with varying tide, wave, and shoreline dynamics. Shorelines derived from the reconstructed and true MNDWI images are compared to quantify the internal accuracy of shoreline reconstruction. For microtidal environments with mean springs tidal range less than 2 m, the mean absolute error (MAE) of shoreline reconstruction is less than 7.5 m with the coefficient of determination (

R^{2}

) more than 0.78 regardless of shoreline and wave dynamics. The method is less skilful in macro- and mesotidal environments due to the larger water level difference in the paired optical and SAR images, resulting in the MAE of 12.59 m and

R^{2}

of 0.43. The proposed SAR-Optical fusion method demonstrates substantially better accuracy in retrieving cloud-obscured shoreline positions compared to interpolation methods relying solely on optical images. Results from our work highlight the great potential of SAR-Optical fusion to derive shorelines even under the cloudiest conditions, thus increasing the temporal resolution of shoreline datasets.

{"title":"Application of SAR-Optical fusion to extract shoreline position from Cloud-Contaminated satellite images","authors":"Yongjing Mao, Kristen D. Splinter","doi":"10.1016/j.isprsjprs.2025.01.013","DOIUrl":"10.1016/j.isprsjprs.2025.01.013","url":null,"abstract":"<div><div>Shorelines derived from optical satellite images are increasingly being used for regional to global scale analysis of sandy coastline dynamics. The optical satellite record, however, is contaminated by cloud cover, which can substantially reduce the temporal resolution of available images for shoreline analysis. Meanwhile, with the development of deep learning methods, optical images are increasingly fused with Synthetic Aperture Radar (SAR) images that are unaffected by clouds to reconstruct the cloud-contaminated pixels. Such SAR-Optical fusion methods have been shown successful for different land surface applications, but the unique characteristics of coastal areas make the applicability of this method unknown in these dynamic zones.</div><div>Herein we apply a deep internal learning (DIL) method to reconstruct cloud-contaminated optical images and explore its applicability to retrieve shorelines obscured by clouds. Our approach uses a mixed sequence of SAR and Gaussian noise images as the prior and the cloudy Modified Normalized Difference Water Index (MNDWI) as the target. The DIL encodes the target with priors and synthesizes plausible pixels under cloud cover. A unique aspect of our workflow is the inclusion of Gaussian noise in the prior sequence for MNDWI images when SAR images collected within a 1-day temporal lag are not available. A novel loss function of DIL model is also introduced to optimize the image reconstruction near the shoreline. These new developments have significant contribution to the model accuracy.</div><div>The DIL method is tested at four different sites with varying tide, wave, and shoreline dynamics. Shorelines derived from the reconstructed and true MNDWI images are compared to quantify the internal accuracy of shoreline reconstruction. For microtidal environments with mean springs tidal range less than 2 m, the mean absolute error (MAE) of shoreline reconstruction is less than 7.5 m with the coefficient of determination (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></mrow></math></span>) more than 0.78 regardless of shoreline and wave dynamics. The method is less skilful in macro- and mesotidal environments due to the larger water level difference in the paired optical and SAR images, resulting in the MAE of 12.59 m and <span><math><mrow><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></mrow></math></span> of 0.43. The proposed SAR-Optical fusion method demonstrates substantially better accuracy in retrieving cloud-obscured shoreline positions compared to interpolation methods relying solely on optical images. Results from our work highlight the great potential of SAR-Optical fusion to derive shorelines even under the cloudiest conditions, thus increasing the temporal resolution of shoreline datasets.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 563-579"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Refined change detection in heterogeneous low-resolution remote sensing images for disaster emergency response 用于灾害应急响应的异构低分辨率遥感图像中的精细变化检测

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.010

Di Wang , Guorui Ma , Haiming Zhang , Xiao Wang , Yongxian Zhang

Heterogeneous Remote Sensing Images Change Detection (HRSICD) is a significant challenge in remote sensing image processing, with substantial application value in rapid natural disaster response. However, significant differences in imaging modalities often result in poor comparability of their features, affecting the recognition accuracy. To address the issue, we propose a novel HRSICD method based on image structure relationships and semantic information. First, we employ a Multi-scale Pyramid Convolution Encoder to efficiently extract the multi-scale and detailed features. Next, the Cross-domain Feature Alignment Module aligns the structural relationships and semantic features of the heterogeneous images, enhancing the comparability between heterogeneous image features. Finally, the Multi-level Decoder fuses the structural and semantic features, achieving refined identification of change areas. We validated the advancement of proposed method on five publicly available HRSICD datasets. Additionally, zero-shot generalization experiments and real-world applications were conducted to assess its generalization capability. Our method achieved favorable results in all experiments, demonstrating its effectiveness. The code of the proposed method will be made available at https://github.com/Lucky-DW/HRSICD.

异构遥感图像变化检测（HRSICD）是遥感图像处理中的一个重大挑战，在快速自然灾害响应中具有重要的应用价值。然而，由于成像方式的差异较大，往往导致其特征的可比性较差，从而影响识别的准确性。为了解决这个问题，我们提出了一种基于图像结构关系和语义信息的HRSICD方法。首先，我们采用一个多尺度金字塔卷积编码器来有效地提取多尺度和细节特征。接下来，跨域特征对齐模块对异构图像的结构关系和语义特征进行对齐，增强异构图像特征之间的可比性。最后，多层解码器融合了结构特征和语义特征，实现了变化区域的精细识别。我们在五个公开可用的HRSICD数据集上验证了所提出方法的先进性。此外，通过零射击泛化实验和实际应用来评估其泛化能力。我们的方法在所有实验中都取得了良好的效果，证明了它的有效性。建议方法的代码将在https://github.com/Lucky-DW/HRSICD上提供。

{"title":"Refined change detection in heterogeneous low-resolution remote sensing images for disaster emergency response","authors":"Di Wang , Guorui Ma , Haiming Zhang , Xiao Wang , Yongxian Zhang","doi":"10.1016/j.isprsjprs.2024.12.010","DOIUrl":"10.1016/j.isprsjprs.2024.12.010","url":null,"abstract":"<div><div>Heterogeneous Remote Sensing Images Change Detection (HRSICD) is a significant challenge in remote sensing image processing, with substantial application value in rapid natural disaster response. However, significant differences in imaging modalities often result in poor comparability of their features, affecting the recognition accuracy. To address the issue, we propose a novel HRSICD method based on image structure relationships and semantic information. First, we employ a Multi-scale Pyramid Convolution Encoder to efficiently extract the multi-scale and detailed features. Next, the Cross-domain Feature Alignment Module aligns the structural relationships and semantic features of the heterogeneous images, enhancing the comparability between heterogeneous image features. Finally, the Multi-level Decoder fuses the structural and semantic features, achieving refined identification of change areas. We validated the advancement of proposed method on five publicly available HRSICD datasets. Additionally, zero-shot generalization experiments and real-world applications were conducted to assess its generalization capability. Our method achieved favorable results in all experiments, demonstrating its effectiveness. The code of the proposed method will be made available at <span><span>https://github.com/Lucky-DW/HRSICD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 139-155"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PylonModeler: A hybrid-driven 3D reconstruction method for power transmission pylons from LiDAR point clouds PylonModeler：一种混合驱动的基于LiDAR点云的输电塔三维重建方法

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.003

Shaolong Wu , Chi Chen , Bisheng Yang , Zhengfei Yan , Zhiye Wang , Shangzhe Sun , Qin Zou , Jing Fu

As the power grid is an indispensable foundation of modern society, creating a digital twin of the grid is of great importance. Pylons serve as components in the transmission corridor, and their precise 3D reconstruction is essential for the safe operation of power grids. However, 3D pylon reconstruction from LiDAR point clouds presents numerous challenges due to data quality and the diversity and complexity of pylon structures. To address these challenges, we introduce PylonModeler: a hybrid-driven method for 3D pylon reconstruction using airborne LiDAR point clouds, thereby enabling accurate, robust, and efficient real-time pylon reconstruction. Different strategies are employed to achieve independent reconstructions and assemblies for various structures. We propose Pylon Former, a lightweight transformer network for real-time pylon recognition and decomposition. Subsequently, we apply a data-driven approach for the pylon body reconstruction. Considering structural characteristics, fitting and clustering algorithms are used to reconstruct both external and internal structures. The pylon head is reconstructed using a hybrid approach. A pre-built pylon head parameter model library defines different pylons by a series of parameters. The coherent point drift (CPD) algorithm is adopted to establish the topological relationships between pylon head structures and set initial model parameters, which are refined through optimization for accurate pylon head reconstruction. Finally, the pylon body and head models are combined to complete the reconstruction. We collected an airborne LiDAR dataset, which includes a total of 3398 pylon data across eight types. The dataset consists of transmission lines of various voltage levels, such as 110 kV, 220 kV, and 500 kV. PylonModeler is validated on this dataset. The average reconstruction time of a pylon is 1.10 s, with an average reconstruction accuracy of 0.216 m. In addition, we evaluate the performance of PylonModeler on public airborne LiDAR data from Luxembourg. Compared to previous state-of-the-art methods, reconstruction accuracy improved by approximately 26.28 %. With superior performance, PylonModeler is tens of times faster than the current model-driven methods, enabling real-time pylon reconstruction.

由于电网是现代社会不可或缺的基础，因此创建电网的数字孪生体具有重要意义。输电塔作为输电走廊的组成部分，其精确的三维重建对电网的安全运行至关重要。然而，由于数据质量和塔结构的多样性和复杂性，激光雷达点云的三维塔重建面临着许多挑战。为了应对这些挑战，我们引入了PylonModeler：一种使用机载激光雷达点云进行3D塔重建的混合驱动方法，从而实现准确、稳健、高效的实时塔重建。采用不同的策略来实现不同结构的独立重建和组装。我们提出了一种用于实时铁塔识别和分解的轻量级变压器网络——铁塔前网络。随后，我们应用数据驱动的方法进行塔体重建。考虑结构特征，采用拟合和聚类算法重构外部和内部结构。采用混合方法重建塔头。预建的塔头参数模型库通过一系列参数定义不同的塔。采用相干点漂移（CPD）算法建立塔头结构之间的拓扑关系，设置初始模型参数，通过优化对模型参数进行细化，实现塔头的精确重建。最后将塔身模型与塔头模型结合，完成重建。我们收集了一个机载激光雷达数据集，其中包括8种类型的3398个塔数据。数据集包括110 kV、220 kV和500 kV等不同电压等级的输电线路。在此数据集上验证PylonModeler。平均重建时间为1.10 s，平均重建精度为0.216 m。此外，我们评估了PylonModeler在卢森堡公共机载LiDAR数据上的性能。与之前最先进的方法相比，重建精度提高了约26.28%。PylonModeler具有卓越的性能，比当前的模型驱动方法快几十倍，可以实现实时的pylon重建。

{"title":"PylonModeler: A hybrid-driven 3D reconstruction method for power transmission pylons from LiDAR point clouds","authors":"Shaolong Wu , Chi Chen , Bisheng Yang , Zhengfei Yan , Zhiye Wang , Shangzhe Sun , Qin Zou , Jing Fu","doi":"10.1016/j.isprsjprs.2024.12.003","DOIUrl":"10.1016/j.isprsjprs.2024.12.003","url":null,"abstract":"<div><div>As the power grid is an indispensable foundation of modern society, creating a digital twin of the grid is of great importance. Pylons serve as components in the transmission corridor, and their precise 3D reconstruction is essential for the safe operation of power grids. However, 3D pylon reconstruction from LiDAR point clouds presents numerous challenges due to data quality and the diversity and complexity of pylon structures. To address these challenges, we introduce PylonModeler: a hybrid-driven method for 3D pylon reconstruction using airborne LiDAR point clouds, thereby enabling accurate, robust, and efficient real-time pylon reconstruction. Different strategies are employed to achieve independent reconstructions and assemblies for various structures. We propose Pylon Former, a lightweight transformer network for real-time pylon recognition and decomposition. Subsequently, we apply a data-driven approach for the pylon body reconstruction. Considering structural characteristics, fitting and clustering algorithms are used to reconstruct both external and internal structures. The pylon head is reconstructed using a hybrid approach. A pre-built pylon head parameter model library defines different pylons by a series of parameters. The coherent point drift (CPD) algorithm is adopted to establish the topological relationships between pylon head structures and set initial model parameters, which are refined through optimization for accurate pylon head reconstruction. Finally, the pylon body and head models are combined to complete the reconstruction. We collected an airborne LiDAR dataset, which includes a total of 3398 pylon data across eight types. The dataset consists of transmission lines of various voltage levels, such as 110 kV, 220 kV, and 500 kV. PylonModeler is validated on this dataset. The average reconstruction time of a pylon is 1.10 s, with an average reconstruction accuracy of 0.216 m. In addition, we evaluate the performance of PylonModeler on public airborne LiDAR data from Luxembourg. Compared to previous state-of-the-art methods, reconstruction accuracy improved by approximately 26.28 %. With superior performance, PylonModeler is tens of times faster than the current model-driven methods, enabling real-time pylon reconstruction.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 100-124"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unwrapping error and fading signal correction on multi-looked InSAR data 多视InSAR数据解包裹误差与衰落信号校正

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.006

Zhangfeng Ma , Nanxin Wang , Yingbao Yang , Yosuke Aoki , Shengji Wei

Multi-looking, aimed at reducing data size and improving the signal-to-noise ratio, is indispensable for large-scale InSAR data processing. However, the resulting “Fading Signal” caused by multi-looking breaks the phase consistency among triplet interferograms and introduces bias into the estimated displacements. This inconsistency challenges the assumption that only unwrapping errors are involved in triplet phase closure. Therefore, untangling phase unwrapping errors and fading signals from triplet phase closure is critical to achieving more precise InSAR measurements. To address this challenge, we propose a new method that mitigates phase unwrapping errors and fading signals. This new method consists of two key steps. The first step is triplet phase closure-based stacking, which allows for the direct estimation of fading signals in each interferogram. The second step is Basis Pursuit Denoising-based unwrapping error correction, which transforms unwrapping error correction into sparse signal recovery. Through these two procedures, the new method can be seamlessly integrated into the traditional InSAR workflow. Additionally, the estimated fading signal can be directly used to derive soil moisture as a by-product of our method. Experimental results on the San Francisco Bay area demonstrate that the new method reduces velocity estimation errors by approximately 9 %–19 %, effectively addressing phase unwrapping errors and fading signals. This performance outperforms both ILP and Lasso methods, which only account for unwrapping errors in the triplet closure. Additionally, the derived by-product, soil moisture, shows strong consistency with most external soil moisture products.

在大规模InSAR数据处理中，以减少数据量和提高信噪比为目的的多视处理是必不可少的。然而，多重观察导致的“衰落信号”破坏了三重态干涉图之间的相位一致性，并在估计位移时引入了偏差。这种不一致挑战了三元组阶段闭包中只涉及展开错误的假设。因此，解缠结相位解包裹误差和三重态相位闭合产生的衰落信号对于实现更精确的InSAR测量至关重要。为了解决这一挑战，我们提出了一种新的方法来减轻相位解包裹误差和衰落信号。这种新方法包括两个关键步骤。第一步是基于三重态相位闭合的叠加，它允许在每个干涉图中直接估计衰落信号。第二步是基于基追求去噪的解包裹纠错，将解包裹纠错转化为稀疏信号恢复。通过这两个步骤，新方法可以无缝集成到传统的InSAR工作流程中。此外，估计的衰落信号可以直接用于导出土壤湿度作为我们的方法的副产品。在旧金山湾区的实验结果表明，新方法将速度估计误差降低了约9% - 19%，有效地解决了相位解包裹误差和衰落信号。这种性能优于ILP和Lasso方法，后者只处理三元组闭包中的展开错误。此外，衍生的副产物土壤湿度与大多数外部土壤湿度产品具有很强的一致性。

{"title":"Unwrapping error and fading signal correction on multi-looked InSAR data","authors":"Zhangfeng Ma , Nanxin Wang , Yingbao Yang , Yosuke Aoki , Shengji Wei","doi":"10.1016/j.isprsjprs.2024.12.006","DOIUrl":"10.1016/j.isprsjprs.2024.12.006","url":null,"abstract":"<div><div>Multi-looking, aimed at reducing data size and improving the signal-to-noise ratio, is indispensable for large-scale InSAR data processing. However, the resulting “Fading Signal” caused by multi-looking breaks the phase consistency among triplet interferograms and introduces bias into the estimated displacements. This inconsistency challenges the assumption that only unwrapping errors are involved in triplet phase closure. Therefore, untangling phase unwrapping errors and fading signals from triplet phase closure is critical to achieving more precise InSAR measurements. To address this challenge, we propose a new method that mitigates phase unwrapping errors and fading signals. This new method consists of two key steps. The first step is triplet phase closure-based stacking, which allows for the direct estimation of fading signals in each interferogram. The second step is Basis Pursuit Denoising-based unwrapping error correction, which transforms unwrapping error correction into sparse signal recovery. Through these two procedures, the new method can be seamlessly integrated into the traditional InSAR workflow. Additionally, the estimated fading signal can be directly used to derive soil moisture as a by-product of our method. Experimental results on the San Francisco Bay area demonstrate that the new method reduces velocity estimation errors by approximately 9 %–19 %, effectively addressing phase unwrapping errors and fading signals. This performance outperforms both ILP and Lasso methods, which only account for unwrapping errors in the triplet closure. Additionally, the derived by-product, soil moisture, shows strong consistency with most external soil moisture products.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 51-63"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate spaceborne waveform simulation in heterogeneous forests using small-footprint airborne LiDAR point clouds 基于小足迹机载激光雷达点云的异质森林精确星载波形模拟

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.11.020

Yi Li , Guangjian Yan , Weihua Li , Donghui Xie , Hailan Jiang , Linyuan Li , Jianbo Qi , Ronghai Hu , Xihan Mu , Xiao Chen , Shanshan Wei , Hao Tang

Spaceborne light detection and ranging (LiDAR) waveform sensors require accurate signal simulations to facilitate prelaunch calibration, postlaunch validation, and the development of land surface data products. However, accurately simulating spaceborne LiDAR waveforms over heterogeneous forests remains challenging because data-driven methods do not account for complicated pulse transport within heterogeneous canopies, whereas analytical radiative transfer models overly rely on assumptions about canopy structure and distribution. Thus, a comprehensive simulation method is needed to account for both the complexity of pulse transport within canopies and the structural heterogeneity of forests. In this study, we propose a framework for spaceborne LiDAR waveform simulation by integrating a new radiative transfer model – the canopy voxel radiative transfer (CVRT) model – with reconstructed three-dimensional (3D) voxel forest scenes from small-footprint airborne LiDAR (ALS) point clouds. The CVRT model describes the radiative transfer process within canopy voxels and uses fractional crown cover to account for within-voxel heterogeneity, minimizing the need for assumptions about canopy shape and distribution and significantly reducing the number of input parameters. All the parameters for scene construction and model inputs can be obtained from the ALS point clouds. The performance of the proposed framework was assessed by comparing the results to the simulated LiDAR waveforms from DART, Global Ecosystem Dynamics Investigation (GEDI) data over heterogeneous forest stands, and Land, Vegetation, and Ice Sensor (LVIS) data from the National Ecological Observatory Network (NEON) site. The results suggest that compared with existing models, the new framework with the CVRT model achieved improved agreement with both simulated and measured data, with an average R² improvement of approximately 2% to 5% and an average RMSE reduction of approximately 0.5% to 3%. The proposed framework was also highly adaptive and robust to variations in model configurations, input data quality, and environmental attributes. In summary, this work extends current research on accurate and robust large-footprint LiDAR waveform simulations over heterogeneous forest canopies and could help refine product development for emerging spaceborne LiDAR missions.

星载光探测和测距（LiDAR）波形传感器需要精确的信号模拟，以促进发射前校准、发射后验证和陆地表面数据产品的开发。然而，准确模拟非均匀森林上的星载LiDAR波形仍然具有挑战性，因为数据驱动的方法不能考虑非均匀冠层内复杂的脉冲传输，而分析辐射传输模型过度依赖于对冠层结构和分布的假设。因此，需要一种综合的模拟方法来考虑脉冲在冠层内传输的复杂性和森林结构的异质性。在这项研究中，我们提出了一个星载激光雷达波形模拟框架，该框架将一种新的辐射传输模型-冠层体素辐射传输（CVRT）模型-与小足迹机载激光雷达（ALS）点云重建的三维（3D）体素森林场景相结合。CVRT模型描述了冠层体素内的辐射传输过程，并使用分数冠层覆盖度来解释体素内的异质性，从而最大限度地减少了对冠层形状和分布的假设需求，并显著减少了输入参数的数量。场景构建和模型输入的所有参数都可以从ALS点云中获得。通过与DART的模拟激光雷达波形、全球生态系统动力学调查（GEDI）数据以及国家生态观测站网络（NEON）的土地、植被和冰传感器（LVIS）数据进行比较，评估了该框架的性能。结果表明，与现有模型相比，CVRT模型的新框架与模拟和测量数据的一致性得到了改善，平均R2提高了约2%至5%，平均RMSE降低了约0.5%至3%。所提出的框架对模型配置、输入数据质量和环境属性的变化也具有高度的适应性和鲁棒性。总之，这项工作扩展了目前在异质森林冠层上精确和强大的大足迹激光雷达波形模拟的研究，并有助于改进新兴星载激光雷达任务的产品开发。

{"title":"Accurate spaceborne waveform simulation in heterogeneous forests using small-footprint airborne LiDAR point clouds","authors":"Yi Li , Guangjian Yan , Weihua Li , Donghui Xie , Hailan Jiang , Linyuan Li , Jianbo Qi , Ronghai Hu , Xihan Mu , Xiao Chen , Shanshan Wei , Hao Tang","doi":"10.1016/j.isprsjprs.2024.11.020","DOIUrl":"10.1016/j.isprsjprs.2024.11.020","url":null,"abstract":"<div><div>Spaceborne light detection and ranging (LiDAR) waveform sensors require accurate signal simulations to facilitate prelaunch calibration, postlaunch validation, and the development of land surface data products. However, accurately simulating spaceborne LiDAR waveforms over heterogeneous forests remains challenging because data-driven methods do not account for complicated pulse transport within heterogeneous canopies, whereas analytical radiative transfer models overly rely on assumptions about canopy structure and distribution. Thus, a comprehensive simulation method is needed to account for both the complexity of pulse transport within canopies and the structural heterogeneity of forests. In this study, we propose a framework for spaceborne LiDAR waveform simulation by integrating a new radiative transfer model – the canopy voxel radiative transfer (CVRT) model – with reconstructed three-dimensional (3D) voxel forest scenes from small-footprint airborne LiDAR (ALS) point clouds. The CVRT model describes the radiative transfer process within canopy voxels and uses fractional crown cover to account for within-voxel heterogeneity, minimizing the need for assumptions about canopy shape and distribution and significantly reducing the number of input parameters. All the parameters for scene construction and model inputs can be obtained from the ALS point clouds. The performance of the proposed framework was assessed by comparing the results to the simulated LiDAR waveforms from DART, Global Ecosystem Dynamics Investigation (GEDI) data over heterogeneous forest stands, and Land, Vegetation, and Ice Sensor (LVIS) data from the National Ecological Observatory Network (NEON) site. The results suggest that compared with existing models, the new framework with the CVRT model achieved improved agreement with both simulated and measured data, with an average R<sup>2</sup> improvement of approximately 2% to 5% and an average RMSE reduction of approximately 0.5% to 3%. The proposed framework was also highly adaptive and robust to variations in model configurations, input data quality, and environmental attributes. In summary, this work extends current research on accurate and robust large-footprint LiDAR waveform simulations over heterogeneous forest canopies and could help refine product development for emerging spaceborne LiDAR missions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 246-263"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Underwater image captioning: Challenges, models, and datasets 水下图像字幕：挑战、模型和数据集

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.002

Huanyu Li , Hao Wang , Ying Zhang , Li Li , Peng Ren

We delve into the nascent field of underwater image captioning from three perspectives: challenges, models, and datasets. One challenge arises from the disparities between natural images and underwater images, which hinder the use of the former to train models for the latter. Another challenge exists in the limited feature extraction capabilities of current image captioning models, impeding the generation of accurate underwater image captions. The final challenge, albeit not the least significant, revolves around the insufficiency of data available for underwater image captioning. This insufficiency not only complicates the training of models but also poses challenges for evaluating their performance effectively. To address these challenges, we make three novel contributions. First, we employ a physics-based degradation technique to transform natural images into degraded images that closely resemble realistic underwater images. Based on the degraded images, we develop a meta-learning strategy specifically tailored for underwater tasks. Second, we develop an underwater image captioning model based on scene-object feature fusion. It fuses underwater scene features extracted by ResNeXt and object features localized by YOLOv8, yielding comprehensive features for underwater image captioning. Last but not least, we construct an underwater image captioning dataset covering various underwater scenes, with each underwater image annotated with five accurate captions for the purpose of comprehensive training and validation. Experimental results on the new dataset validate the effectiveness of our novel models. The code and datasets are released at https://gitee.com/LHY-CODE/UICM-SOFF.

我们从三个角度深入研究水下图像字幕的新兴领域：挑战，模型和数据集。其中一个挑战来自自然图像和水下图像之间的差异，这阻碍了使用前者来训练后者的模型。另一个挑战是现有图像字幕模型的特征提取能力有限，阻碍了准确的水下图像字幕的生成。最后一个挑战，尽管不是最不重要的，围绕着水下图像字幕可用数据的不足。这种不足不仅使模型的训练变得复杂，而且对有效评估模型的性能提出了挑战。为了应对这些挑战，我们做出了三个新的贡献。首先，我们采用基于物理的退化技术将自然图像转化为与真实水下图像非常相似的退化图像。基于退化的图像，我们开发了专门为水下任务量身定制的元学习策略。其次，提出了一种基于景物特征融合的水下图像字幕模型。它融合了ResNeXt提取的水下场景特征和YOLOv8定位的目标特征，得到了用于水下图像字幕的综合特征。最后，我们构建了一个涵盖各种水下场景的水下图像字幕数据集，每个水下图像都标注了五个准确的字幕，以进行全面的训练和验证。在新数据集上的实验结果验证了新模型的有效性。代码和数据集发布在https://gitee.com/LHY-CODE/UICM-SOFF。

{"title":"Underwater image captioning: Challenges, models, and datasets","authors":"Huanyu Li , Hao Wang , Ying Zhang , Li Li , Peng Ren","doi":"10.1016/j.isprsjprs.2024.12.002","DOIUrl":"10.1016/j.isprsjprs.2024.12.002","url":null,"abstract":"<div><div>We delve into the nascent field of underwater image captioning from three perspectives: challenges, models, and datasets. One challenge arises from the disparities between natural images and underwater images, which hinder the use of the former to train models for the latter. Another challenge exists in the limited feature extraction capabilities of current image captioning models, impeding the generation of accurate underwater image captions. The final challenge, albeit not the least significant, revolves around the insufficiency of data available for underwater image captioning. This insufficiency not only complicates the training of models but also poses challenges for evaluating their performance effectively. To address these challenges, we make three novel contributions. First, we employ a physics-based degradation technique to transform natural images into degraded images that closely resemble realistic underwater images. Based on the degraded images, we develop a meta-learning strategy specifically tailored for underwater tasks. Second, we develop an underwater image captioning model based on scene-object feature fusion. It fuses underwater scene features extracted by ResNeXt and object features localized by YOLOv8, yielding comprehensive features for underwater image captioning. Last but not least, we construct an underwater image captioning dataset covering various underwater scenes, with each underwater image annotated with five accurate captions for the purpose of comprehensive training and validation. Experimental results on the new dataset validate the effectiveness of our novel models. The code and datasets are released at <span><span>https://gitee.com/LHY-CODE/UICM-SOFF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 440-453"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel airborne TomoSAR 3-D focusing method for accurate ice thickness and glacier volume estimation 一种新的机载TomoSAR三维聚焦方法，用于精确估算冰厚和冰川体积

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.011

Ke Wang , Yue Wu , Xiaolan Qiu , Jinbiao Zhu , Donghai Zheng , Songtao Shangguan , Jie Pan , Yuquan Liu , Liming Jiang , Xin Li

High-altitude mountain glaciers are highly responsive to environmental changes. However, their remote locations limit the applicability of traditional mapping methods, such as probing and Ground Penetrating Radar (GPR), in tracking changes in ice thickness and glacier volume. Over the past two decades, airborne Tomographic Synthetic Aperture Radar (TomoSAR) has shown promise for mapping the internal structures of mountain glaciers. Yet, its 3D mapping capabilities are limited by the radar signal’s relatively shallow penetration depth, with bedrock echoes rarely detected beyond 60 meters. Additionally, most TomoSAR studies ignored the air-ice refraction during the image-focusing step, reducing the 3D focusing accuracy for deeper subsurface targets. In this study, we developed a novel algorithm that integrates refraction path calculations into SAR image focusing. We also introduced a new method to construct the 3D TomoSAR cube by stacking InSAR phase coherence images, enabling the retrieval of deep bedrock signals even at low signal-to-noise ratios.

We tested our algorithms on 14 P-band SAR images acquired on April 8, 2023, over Bayi Glacier in the Qilian Mountains, located on the Qinghai-Tibet Plateau. For the first time, we successfully mapped the ice thickness across an entire mountain glacier using the airborne TomoSAR technique, detecting bedrock signals at depths reaching up to 120 m. Our ice thickness estimates showed strong agreement with in situ measurements from three GPR transects totaling 3.8 km in length, with root-mean-square errors (RMSE) ranging from 3.18 to 4.66 m. For comparison, we applied the state-of-the-art 3D focusing algorithm used in the AlpTomoSAR campaign for ice thickness estimation, which resulted in RMSE values between 5.67 and 5.81 m. Our proposed method reduced the RMSE by 18% to 44% relative to the AlpTomoSAR algorithm. Based on these measurements, we calculated a total ice volume of 0.121 km

^{3}

, reflecting a decline of approximately 20.92% since the last reported volume in 2009, which was estimated from sparse GPR data. These results demonstrate that the proposed algorithm can effectively map ice thickness, providing a cost-efficient solution for large-scale glacier surveys in high-mountain regions.

高海拔山地冰川对环境变化反应迅速。然而，它们的偏远位置限制了探测和探地雷达（GPR）等传统测绘方法在跟踪冰厚和冰川体积变化方面的适用性。在过去的二十年里，机载层析合成孔径雷达（TomoSAR）在绘制山地冰川内部结构方面显示出了希望。然而，雷达信号的穿透深度相对较浅，基岩回波很少超过60米，因此其3D制图能力受到限制。此外，大多数TomoSAR研究在图像聚焦步骤中忽略了空气-冰折射，降低了深层地下目标的三维聚焦精度。在这项研究中，我们开发了一种新的算法，将折射路径计算集成到SAR图像聚焦中。我们还介绍了一种通过叠加InSAR相位相干图像来构建三维TomoSAR立方体的新方法，即使在低信噪比的情况下也能检索深层基岩信号。

{"title":"A novel airborne TomoSAR 3-D focusing method for accurate ice thickness and glacier volume estimation","authors":"Ke Wang , Yue Wu , Xiaolan Qiu , Jinbiao Zhu , Donghai Zheng , Songtao Shangguan , Jie Pan , Yuquan Liu , Liming Jiang , Xin Li","doi":"10.1016/j.isprsjprs.2025.01.011","DOIUrl":"10.1016/j.isprsjprs.2025.01.011","url":null,"abstract":"<div><div>High-altitude mountain glaciers are highly responsive to environmental changes. However, their remote locations limit the applicability of traditional mapping methods, such as probing and Ground Penetrating Radar (GPR), in tracking changes in ice thickness and glacier volume. Over the past two decades, airborne Tomographic Synthetic Aperture Radar (TomoSAR) has shown promise for mapping the internal structures of mountain glaciers. Yet, its 3D mapping capabilities are limited by the radar signal’s relatively shallow penetration depth, with bedrock echoes rarely detected beyond 60 meters. Additionally, most TomoSAR studies ignored the air-ice refraction during the image-focusing step, reducing the 3D focusing accuracy for deeper subsurface targets. In this study, we developed a novel algorithm that integrates refraction path calculations into SAR image focusing. We also introduced a new method to construct the 3D TomoSAR cube by stacking InSAR phase coherence images, enabling the retrieval of deep bedrock signals even at low signal-to-noise ratios.</div><div>We tested our algorithms on 14 P-band SAR images acquired on April 8, 2023, over Bayi Glacier in the Qilian Mountains, located on the Qinghai-Tibet Plateau. For the first time, we successfully mapped the ice thickness across an entire mountain glacier using the airborne TomoSAR technique, detecting bedrock signals at depths reaching up to 120 m. Our ice thickness estimates showed strong agreement with in situ measurements from three GPR transects totaling 3.8 km in length, with root-mean-square errors (RMSE) ranging from 3.18 to 4.66 m. For comparison, we applied the state-of-the-art 3D focusing algorithm used in the AlpTomoSAR campaign for ice thickness estimation, which resulted in RMSE values between 5.67 and 5.81 m. Our proposed method reduced the RMSE by 18% to 44% relative to the AlpTomoSAR algorithm. Based on these measurements, we calculated a total ice volume of 0.121 km<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>, reflecting a decline of approximately 20.92% since the last reported volume in 2009, which was estimated from sparse GPR data. These results demonstrate that the proposed algorithm can effectively map ice thickness, providing a cost-efficient solution for large-scale glacier surveys in high-mountain regions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 593-607"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An interactive fusion attention-guided network for ground surface hot spring fluids segmentation in dual-spectrum UAV images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.022

Shi Yi , Mengting Chen , Xuesong Yuan , Si Guo , Jiashuai Wang

<div><div>Investigating the distribution of ground surface hot spring fluids is crucial for the exploitation and utilization of geothermal resources. The detailed information provided by dual-spectrum images captured by unmanned aerial vehicles (UAVs) flew at low altitudes is beneficial to accurately segment ground surface hot spring fluids. However, existing image segmentation methods face significant challenges of hot spring fluids segmentation due to the frequent and irregular variations in fluid boundaries, meanwhile the presence of substances within such fluids lead to segmentation uncertainties. In addition, there is currently no benchmark dataset dedicated to ground surface hot spring fluid segmentation in dual-spectrum UAV images. To this end, in this study, a benchmark dataset called the dual-spectrum hot spring fluid segmentation (DHFS) dataset was constructed for segmenting ground surface hot spring fluids in dual-spectrum UAV images. Additionally, a novel interactive fusion attention-guided RGB-Thermal (RGB-T) semantic segmentation network named IFAGNet was proposed in this study for accurately segmenting ground surface hot spring fluids in dual-spectrum UAV images. The proposed IFAGNet consists of two sub-networks that leverage two feature fusion architectures and the two-stage feature fusion module is designed to achieve optimal intermediate feature fusion. Furthermore, IFAGNet utilizes an interactive fusion attention-guided architecture to guide the two sub-networks further process the extracted features through complementary information exchange, resulting in a significant boost in hot spring fluid segmentation accuracy. Additionally, two down-up full scale feature pyramid network (FPN) decoders are developed for each sub-network to fully utilize multi-stage fused features and improve the preservation of detailed information during hot spring fluid segmentation. Moreover, a hybrid consistency learning strategy is implemented to train the IFAGNet, which combines fully supervised learning with consistency learning between each sub-network and their fusion results to further optimize the segmentation accuracy of hot spring fluid in RGB-T UAV images. The optimal model of the IFAGNet was tested on the proposed DHFS dataset, and the experimental results demonstrated that the IFAGNet outperforms existing image segmentation frameworks in terms of segmentation accuracy for hot spring fluids segmentation in dual-spectrum UAV images which achieved Pixel Accuracy (PA) of 96.1%, Precision of 93.2%, Recall of 85.9%, Intersection over Union (IoU) of 78.3%, and F1-score (F1) of 89.4%, respectively. And overcomes segmentation uncertainties to a great extent, while maintaining competitive computational efficiency. The ablation studies have confirmed the effectiveness of each main innovation in IFAGNet for improving the accuracy of hot spring fluid segmentation. Therefore, the proposed DHFS dataset and IFAGNet lay the foundation for segmentation of

{"title":"An interactive fusion attention-guided network for ground surface hot spring fluids segmentation in dual-spectrum UAV images","authors":"Shi Yi , Mengting Chen , Xuesong Yuan , Si Guo , Jiashuai Wang","doi":"10.1016/j.isprsjprs.2025.01.022","DOIUrl":"10.1016/j.isprsjprs.2025.01.022","url":null,"abstract":"<div><div>Investigating the distribution of ground surface hot spring fluids is crucial for the exploitation and utilization of geothermal resources. The detailed information provided by dual-spectrum images captured by unmanned aerial vehicles (UAVs) flew at low altitudes is beneficial to accurately segment ground surface hot spring fluids. However, existing image segmentation methods face significant challenges of hot spring fluids segmentation due to the frequent and irregular variations in fluid boundaries, meanwhile the presence of substances within such fluids lead to segmentation uncertainties. In addition, there is currently no benchmark dataset dedicated to ground surface hot spring fluid segmentation in dual-spectrum UAV images. To this end, in this study, a benchmark dataset called the dual-spectrum hot spring fluid segmentation (DHFS) dataset was constructed for segmenting ground surface hot spring fluids in dual-spectrum UAV images. Additionally, a novel interactive fusion attention-guided RGB-Thermal (RGB-T) semantic segmentation network named IFAGNet was proposed in this study for accurately segmenting ground surface hot spring fluids in dual-spectrum UAV images. The proposed IFAGNet consists of two sub-networks that leverage two feature fusion architectures and the two-stage feature fusion module is designed to achieve optimal intermediate feature fusion. Furthermore, IFAGNet utilizes an interactive fusion attention-guided architecture to guide the two sub-networks further process the extracted features through complementary information exchange, resulting in a significant boost in hot spring fluid segmentation accuracy. Additionally, two down-up full scale feature pyramid network (FPN) decoders are developed for each sub-network to fully utilize multi-stage fused features and improve the preservation of detailed information during hot spring fluid segmentation. Moreover, a hybrid consistency learning strategy is implemented to train the IFAGNet, which combines fully supervised learning with consistency learning between each sub-network and their fusion results to further optimize the segmentation accuracy of hot spring fluid in RGB-T UAV images. The optimal model of the IFAGNet was tested on the proposed DHFS dataset, and the experimental results demonstrated that the IFAGNet outperforms existing image segmentation frameworks in terms of segmentation accuracy for hot spring fluids segmentation in dual-spectrum UAV images which achieved Pixel Accuracy (PA) of 96.1%, Precision of 93.2%, Recall of 85.9%, Intersection over Union (IoU) of 78.3%, and F1-score (F1) of 89.4%, respectively. And overcomes segmentation uncertainties to a great extent, while maintaining competitive computational efficiency. The ablation studies have confirmed the effectiveness of each main innovation in IFAGNet for improving the accuracy of hot spring fluid segmentation. Therefore, the proposed DHFS dataset and IFAGNet lay the foundation for segmentation of ","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 661-691"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Plug-and-play DISep: Separating dense instances for scene-to-pixel weakly-supervised change detection in high-resolution remote sensing images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.007

Zhenghui Zhao , Chen Wu , Lixiang Ru , Di Wang , Hongruixuan Chen , Cuiqun Chen

Change Detection (CD) focuses on identifying specific pixel-level landscape changes in multi-temporal remote sensing images. The process of obtaining pixel-level annotations for CD is generally both time-consuming and labor-intensive. Faced with this annotation challenge, there has been a growing interest in research on Weakly-Supervised Change Detection (WSCD). WSCD aims to detect pixel-level changes using only scene-level (i.e., image-level) change labels, thereby offering a more cost-effective approach. Despite considerable efforts to precisely locate changed regions, existing WSCD methods often encounter the problem of “instance lumping” under scene-level supervision, particularly in scenarios with a dense distribution of changed instances (i.e., changed objects). In these scenarios, unchanged pixels between changed instances are also mistakenly identified as changed, causing multiple changes to be mistakenly viewed as one. In practical applications, this issue prevents the accurate quantification of the number of changes. To address this issue, we propose a Dense Instance Separation (DISep) method as a plug-and-play solution, refining pixel features from a unified instance perspective under scene-level supervision. Specifically, our DISep comprises a three-step iterative training process: (1) Instance Localization: We locate instance candidate regions for changed pixels using high-pass class activation maps. (2) Instance Retrieval: We identify and group these changed pixels into different instance IDs through connectivity searching. Then, based on the assigned instance IDs, we extract corresponding pixel-level features on a per-instance basis. (3) Instance Separation: We introduce a separation loss to enforce intra-instance pixel consistency in the embedding space, thereby ensuring separable instance feature representations. The proposed DISep adds only minimal training cost and no inference cost. It can be seamlessly integrated to enhance existing WSCD methods. We achieve state-of-the-art performance by enhancing three Transformer-based and four ConvNet-based methods on the LEVIR-CD, WHU-CD, DSIFN-CD, SYSU-CD, and CDD datasets. Additionally, our DISep can be used to improve fully-supervised change detection methods. Code is available at https://github.com/zhenghuizhao/Plug-and-Play-DISep-for-Change-Detection.

{"title":"Plug-and-play DISep: Separating dense instances for scene-to-pixel weakly-supervised change detection in high-resolution remote sensing images","authors":"Zhenghui Zhao , Chen Wu , Lixiang Ru , Di Wang , Hongruixuan Chen , Cuiqun Chen","doi":"10.1016/j.isprsjprs.2025.01.007","DOIUrl":"10.1016/j.isprsjprs.2025.01.007","url":null,"abstract":"<div><div>Change Detection (CD) focuses on identifying specific pixel-level landscape changes in multi-temporal remote sensing images. The process of obtaining pixel-level annotations for CD is generally both time-consuming and labor-intensive. Faced with this annotation challenge, there has been a growing interest in research on Weakly-Supervised Change Detection (WSCD). WSCD aims to detect pixel-level changes using only scene-level (i.e., image-level) change labels, thereby offering a more cost-effective approach. Despite considerable efforts to precisely locate changed regions, existing WSCD methods often encounter the problem of “instance lumping” under scene-level supervision, particularly in scenarios with a dense distribution of changed instances (i.e., changed objects). In these scenarios, unchanged pixels between changed instances are also mistakenly identified as changed, causing multiple changes to be mistakenly viewed as one. In practical applications, this issue prevents the accurate quantification of the number of changes. To address this issue, we propose a Dense Instance Separation (DISep) method as a plug-and-play solution, refining pixel features from a unified instance perspective under scene-level supervision. Specifically, our DISep comprises a three-step iterative training process: (1) Instance Localization: We locate instance candidate regions for changed pixels using high-pass class activation maps. (2) Instance Retrieval: We identify and group these changed pixels into different instance IDs through connectivity searching. Then, based on the assigned instance IDs, we extract corresponding pixel-level features on a per-instance basis. (3) Instance Separation: We introduce a separation loss to enforce intra-instance pixel consistency in the embedding space, thereby ensuring separable instance feature representations. The proposed DISep adds only minimal training cost and no inference cost. It can be seamlessly integrated to enhance existing WSCD methods. We achieve state-of-the-art performance by enhancing three Transformer-based and four ConvNet-based methods on the LEVIR-CD, WHU-CD, DSIFN-CD, SYSU-CD, and CDD datasets. Additionally, our DISep can be used to improve fully-supervised change detection methods. Code is available at <span><span>https://github.com/zhenghuizhao/Plug-and-Play-DISep-for-Change-Detection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 770-782"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0