首页 > 最新文献

ISPRS Journal of Photogrammetry and Remote Sensing最新文献

英文 中文
B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection B3-CDG:用于双时态建筑物二进制变化检测的伪样本扩散发生器
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-14 DOI: 10.1016/j.isprsjprs.2024.10.021
Peng Chen , Peixian Li , Bing Wang , Sihai Zhao , Yongliang Zhang , Tao Zhang , Xingcheng Ding
Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B3-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B3-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B3-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B3-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B3-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B3-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B3-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B3-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at https://github.com/ABCnutter/B3-CDG.
建筑物变化检测(CD)在城市规划、土地资源管理和灾害监测中发挥着至关重要的作用。目前,深度学习已成为建筑物变化检测的关键方法,但挑战依然存在。获取大规模、精确注册的双时相图像非常困难,标注也非常耗时。因此,我们提出了基于潜在扩散原理的双时态建筑二进制 CD 伪样本生成器 B3-CDG。该生成器将建筑变化过程视为局部语义状态转换。它利用文字说明和掩码提示,在单时相图像的指定区域生成特定类别的变化,从而创建具有清晰语义转换的不同时相图像。B3-CDG 由大规模预训练模型驱动,并利用外部适配器引导模型学习遥感图像分布。为了生成无缝的建筑边界,B3-CDG 采用了一种简单而有效的方法--压缩遮罩,迫使模型学习边界细节。此外,B3-CDG 还结合了扩散引导和数据增强技术,以增强图像的真实性。在生成实验中,与之前的基线方法(如 Inpaint 和 IAug)相比,B3-CDG 性能最佳,FID 最低(26.40),IS 最高(4.60)。该方法有效地解决了边界连续性、阴影生成和植被遮挡等难题,同时确保生成的建筑屋顶结构和颜色逼真多样。在应用实验中,B3-CDG 在 LEVIR 和 WHUCD 数据集上分别将验证模型(SFFNet)的 IOU 提高了 6.34 % 和 7.10 %。当真实数据极其有限时(仅使用原始数据的 5%),改进幅度进一步达到 33.68 % 和 32.40 %。此外,B3-CDG 还能提高 SNUNet 和 ChangeFormer 等高级 CD 模型的基线性能。消融研究进一步证实了 B3-CDG 设计的有效性。这项研究为构建 CD 引入了一种新的研究范式,有可能推动该领域的发展。源代码和数据集将发布在 https://github.com/ABCnutter/B3-CDG 网站上。
{"title":"B3-CDG: A pseudo-sample diffusion generator for bi-temporal building binary change detection","authors":"Peng Chen ,&nbsp;Peixian Li ,&nbsp;Bing Wang ,&nbsp;Sihai Zhao ,&nbsp;Yongliang Zhang ,&nbsp;Tao Zhang ,&nbsp;Xingcheng Ding","doi":"10.1016/j.isprsjprs.2024.10.021","DOIUrl":"10.1016/j.isprsjprs.2024.10.021","url":null,"abstract":"<div><div>Building change detection (CD) plays a crucial role in urban planning, land resource management, and disaster monitoring. Currently, deep learning has become a key approach in building CD, but challenges persist. Obtaining large-scale, accurately registered bi-temporal images is difficult, and annotation is time-consuming. Therefore, we propose B<sup>3</sup>-CDG, a bi-temporal building binary CD pseudo-sample generator based on the principle of latent diffusion. This generator treats building change processes as local semantic states transformations. It utilizes textual instructions and mask prompts to generate specific class changes in designated regions of single-temporal images, creating different temporal images with clear semantic transitions. B<sup>3</sup>-CDG is driven by large-scale pretrained models and utilizes external adapters to guide the model in learning remote sensing image distributions. To generate seamless building boundaries, B<sup>3</sup>-CDG adopts a simple and effective approach—dilation masks—to compel the model to learn boundary details. In addition, B<sup>3</sup>-CDG incorporates diffusion guidance and data augmentation to enhance image realism. In the generation experiments, B<sup>3</sup>-CDG achieved the best performance with the lowest FID (26.40) and the highest IS (4.60) compared to previous baseline methods (such as Inpaint and IAug). This method effectively addresses challenges such as boundary continuity, shadow generation, and vegetation occlusion while ensuring that the generated building roof structures and colors are realistic and diverse. In the application experiments, B<sup>3</sup>-CDG improved the IOU of the validation model (SFFNet) by 6.34 % and 7.10 % on the LEVIR and WHUCD datasets, respectively. When the real data is extremely limited (using only 5 % of the original data), the improvement further reaches 33.68 % and 32.40 %. Moreover, B<sup>3</sup>-CDG can enhance the baseline performance of advanced CD models, such as SNUNet and ChangeFormer. Ablation studies further confirm the effectiveness of the B<sup>3</sup>-CDG design. This study introduces a novel research paradigm for building CD, potentially advancing the field. Source code and datasets will be available at <span><span>https://github.com/ABCnutter/B3-CDG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 408-429"},"PeriodicalIF":10.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mesh refinement method for multi-view stereo with unary operations 采用单值运算的多视角立体网格细化方法
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-12 DOI: 10.1016/j.isprsjprs.2024.10.023
Jianchen Liu, Shuang Han, Jin Li
3D reconstruction is an important part of digital city, high-accuracy 3D modeling method has been widely studied as an important pathway to visualizing 3D city scenes. However, the problems of image resolution, noise, and occlusion result in low quality and smooth features in the mesh model. Therefore, the model needs to be refined to improve the mesh quality and enhance the visual effect. This paper proposes a mesh refinement algorithm to fine-tune the vertices of the mesh and constrain their evolution direction on the normal vector, reducing their freedom degrees to one. The evolution of vertices only involves one motion distance parameter on the normal vector, simplifying the complexity of the energy function derivation. Meanwhile, Gaussian curvature is used as a regularization term, which is anisotropic and preserves the edge features during the reconstruction process. The mesh refinement algorithm with unary operations fully utilizes the original image information and effectively enriches the local detail features of the mesh model. This paper utilizes five public datasets to conduct comparative experiments, and the experimental results show that the proposed algorithm can better restore the detailed features of the model and has a better refinement effect in the same number of iterations compared with OpenMVS library refinement algorithm. At the same time, in the comparison of refinement results with fewer iterations, the algorithm in this paper can achieve more desirable results.
三维重建是数字城市的重要组成部分,高精度三维建模方法作为可视化三维城市场景的重要途径已被广泛研究。然而,由于图像分辨率、噪声和遮挡等问题,导致网格模型质量不高,特征不平滑。因此,需要对模型进行细化,以提高网格质量,增强视觉效果。本文提出了一种网格细化算法,对网格顶点进行微调,并约束其在法向量上的演化方向,将其自由度降为 1。顶点的演化只涉及法向量上的一个运动距离参数,简化了能量函数推导的复杂性。同时,高斯曲率被用作正则化项,它是各向异性的,在重建过程中能保留边缘特征。采用单值运算的网格细化算法充分利用了原始图像信息,有效地丰富了网格模型的局部细节特征。本文利用五个公开数据集进行了对比实验,实验结果表明,与 OpenMVS 库细化算法相比,在相同的迭代次数下,本文提出的算法能更好地还原模型的细节特征,细化效果更好。同时,在迭代次数较少的细化结果对比中,本文算法能取得更理想的结果。
{"title":"Mesh refinement method for multi-view stereo with unary operations","authors":"Jianchen Liu,&nbsp;Shuang Han,&nbsp;Jin Li","doi":"10.1016/j.isprsjprs.2024.10.023","DOIUrl":"10.1016/j.isprsjprs.2024.10.023","url":null,"abstract":"<div><div>3D reconstruction is an important part of digital city, high-accuracy 3D modeling method has been widely studied as an important pathway to visualizing 3D city scenes. However, the problems of image resolution, noise, and occlusion result in low quality and smooth features in the mesh model. Therefore, the model needs to be refined to improve the mesh quality and enhance the visual effect. This paper proposes a mesh refinement algorithm to fine-tune the vertices of the mesh and constrain their evolution direction on the normal vector, reducing their freedom degrees to one. The evolution of vertices only involves one motion distance parameter on the normal vector, simplifying the complexity of the energy function derivation. Meanwhile, Gaussian curvature is used as a regularization term, which is anisotropic and preserves the edge features during the reconstruction process. The mesh refinement algorithm with unary operations fully utilizes the original image information and effectively enriches the local detail features of the mesh model. This paper utilizes five public datasets to conduct comparative experiments, and the experimental results show that the proposed algorithm can better restore the detailed features of the model and has a better refinement effect in the same number of iterations compared with OpenMVS library refinement algorithm. At the same time, in the comparison of refinement results with fewer iterations, the algorithm in this paper can achieve more desirable results.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 361-375"},"PeriodicalIF":10.6,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and accurate SAR geocoding with a plane approximation 利用平面近似进行快速准确的合成孔径雷达地理编码
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-11 DOI: 10.1016/j.isprsjprs.2024.10.031
Shaokun Guo , Jie Dong , Yian Wang , Mingsheng Liao
Geocoding is the procedure of finding the mapping between the Synthetic Aperture Radar (SAR) image and the imaged scene. The inverse form of the Range-Doppler (RD) model has been adopted to approximate the geocoding results. However, with advances in SAR imaging geodesy, its imprecise nature becomes more perceptible. The forward RD model gives reliable solutions but is time-consuming and unable to detect geometric distortions. This study proposes a highly optimized forward geocoding method to find the precise ground position of each image sample with a Digital Elevation Model (DEM). By following the intersection of the terrain and the so-called solution surface of an azimuth line, which can be locally approximated by a plane, it produces geo-location results almost identical to the analytical solutions of the RD model. At the same time, the non-unique geocoding solutions and the geometric distortions are determined. Deviations from the employed approximations are assessed, showing that they are highly predictable and lead to negligible range/azimuth residuals. The general robustness is verified by experiments on SAR images of different resolutions covering diversified terrains in the native or zero Doppler geometry. Comparisons with other forward algorithms demonstrate that, with extra geometric distortions detection ability, its accuracy and efficiency are comparable to them. For a Sentinel-1 IW burst of high topographic relief, the algorithm ends in a 3 s using 16 parallel cores, with an average residual smaller than one millimeter. Its impressive blend of efficiency, accuracy, and geometric distortion detection capabilities makes it ideal for large-scale remote sensing applications.
地理编码是找到合成孔径雷达(SAR)图像与成像场景之间映射关系的过程。测距-多普勒(RD)模型的反形式被用来近似地理编码结果。然而,随着合成孔径雷达成像大地测量技术的发展,其不精确性变得越来越明显。前向 RD 模型能给出可靠的解,但耗时长,且无法检测几何变形。本研究提出了一种高度优化的前向大地编码方法,利用数字高程模型(DEM)找到每个图像样本的精确地面位置。通过跟踪地形与方位角线的所谓解面(可局部近似为平面)的交点,该方法得出的地理定位结果与 RD 模型的解析解几乎完全一致。同时,还确定了非唯一地理编码解和几何变形。对所采用近似值的偏差进行了评估,结果表明这些偏差具有很高的可预测性,导致的测距/方位角残差可以忽略不计。通过对不同分辨率的合成孔径雷达图像进行实验,验证了在原生或零多普勒几何条件下覆盖不同地形的总体稳健性。与其他前向算法的比较表明,该算法具有额外的几何畸变检测能力,其精度和效率与其他算法不相上下。对于地形起伏较大的哨兵-1 IW 阵列,该算法使用 16 个并行内核在 3 秒内完成,平均残差小于 1 毫米。该算法在效率、准确性和几何畸变检测能力方面的出色表现使其成为大规模遥感应用的理想选择。
{"title":"Fast and accurate SAR geocoding with a plane approximation","authors":"Shaokun Guo ,&nbsp;Jie Dong ,&nbsp;Yian Wang ,&nbsp;Mingsheng Liao","doi":"10.1016/j.isprsjprs.2024.10.031","DOIUrl":"10.1016/j.isprsjprs.2024.10.031","url":null,"abstract":"<div><div>Geocoding is the procedure of finding the mapping between the Synthetic Aperture Radar (SAR) image and the imaged scene. The inverse form of the Range-Doppler (RD) model has been adopted to approximate the geocoding results. However, with advances in SAR imaging geodesy, its imprecise nature becomes more perceptible. The forward RD model gives reliable solutions but is time-consuming and unable to detect geometric distortions. This study proposes a highly optimized forward geocoding method to find the precise ground position of each image sample with a Digital Elevation Model (DEM). By following the intersection of the terrain and the so-called solution surface of an azimuth line, which can be locally approximated by a plane, it produces geo-location results almost identical to the analytical solutions of the RD model. At the same time, the non-unique geocoding solutions and the geometric distortions are determined. Deviations from the employed approximations are assessed, showing that they are highly predictable and lead to negligible range/azimuth residuals. The general robustness is verified by experiments on SAR images of different resolutions covering diversified terrains in the native or zero Doppler geometry. Comparisons with other forward algorithms demonstrate that, with extra geometric distortions detection ability, its accuracy and efficiency are comparable to them. For a Sentinel-1 IW burst of high topographic relief, the algorithm ends in a 3 s using 16 parallel cores, with an average residual smaller than one millimeter. Its impressive blend of efficiency, accuracy, and geometric distortion detection capabilities makes it ideal for large-scale remote sensing applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 344-360"},"PeriodicalIF":10.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D point cloud regularization method for uniform mesh generation of mining excavations 采矿挖掘均匀网格生成的三维点云正则化方法
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-09 DOI: 10.1016/j.isprsjprs.2024.10.024
Przemysław Dąbek, Jacek Wodecki, Paulina Kujawa, Adam Wróblewski, Arkadiusz Macek, Radosław Zimroz
Mine excavation systems are usually dozens of kilometers long with varying geometry on a small scale (roughness and shape of the walls) and on a large scale (varying widths of the tunnels, turns, and crossings). In this article, the authors address the problem of analyzing laser scanning data from large mining structures that can be used for various purposes, with focus on ventilation simulations. Together with the quality of the measurement data (diverse point-cloud density, missing samples, holes induced by obstructions in the field of view, measurement noise), this creates problems that require multi-stage processing of the obtained data. The authors propose a robust methodology to process a single segmented section of the mining system. The presented approach focuses on obtaining a point cloud ready for application in the computational fluid dynamics (CFD) analysis of airflow with minimal need for additional manual corrections on the generated mesh model. This requires the point cloud to have evenly distributed points and reduced noise (together with removal of objects inside) while keeping the unique geometrical properties and shape of the scanned tunnels. Proposed methodology uses trajectory of the excavation either obtained during the measurements or by skeletonization process explained in the article. Cross-sections obtained on planes perpendicular to the trajectory are processed towards the equalization of point distribution, removing measurement noise, holes in the point cloud and objects inside the excavation. The effects of the proposed algorithm are validated by comparing the processed cloud with the original cloud and testing within the CFD environment. The algorithm proved high effectiveness in improving skewness rate of the obtained mesh and geometry mapping accuracy (standard deviation below 5 centimeters in cloud-to-mesh comparison).
矿山挖掘系统通常长达数十公里,其几何形状在小范围(墙壁的粗糙度和形状)和大范围(隧道、转弯和交叉口的宽度不同)都各不相同。在这篇文章中,作者探讨了分析大型采矿结构激光扫描数据的问题,这些数据可用于各种用途,重点是通风模拟。由于测量数据的质量问题(点云密度不同、样本缺失、视场中的障碍物造成的孔洞、测量噪音),这就产生了需要对所获数据进行多阶段处理的问题。作者提出了一种稳健的方法来处理采矿系统的单一分段。该方法的重点是获取可用于气流计算流体动力学(CFD)分析的点云,并尽量减少对生成的网格模型进行额外手动修正的需要。这就要求点云具有均匀分布的点并减少噪音(同时去除内部物体),同时保持扫描隧道的独特几何特性和形状。所提出的方法使用了在测量过程中或通过文章中解释的骨架化过程获得的挖掘轨迹。在垂直于轨迹的平面上获得的横截面经过处理,以实现点分布的均衡化,去除测量噪声、点云中的孔洞和挖掘物内部的物体。通过比较处理后的云和原始云,并在 CFD 环境中进行测试,验证了所提算法的效果。事实证明,该算法在提高所获网格的偏斜率和几何映射精度(在云与网格的比较中,标准偏差低于 5 厘米)方面非常有效。
{"title":"3D point cloud regularization method for uniform mesh generation of mining excavations","authors":"Przemysław Dąbek,&nbsp;Jacek Wodecki,&nbsp;Paulina Kujawa,&nbsp;Adam Wróblewski,&nbsp;Arkadiusz Macek,&nbsp;Radosław Zimroz","doi":"10.1016/j.isprsjprs.2024.10.024","DOIUrl":"10.1016/j.isprsjprs.2024.10.024","url":null,"abstract":"<div><div>Mine excavation systems are usually dozens of kilometers long with varying geometry on a small scale (roughness and shape of the walls) and on a large scale (varying widths of the tunnels, turns, and crossings). In this article, the authors address the problem of analyzing laser scanning data from large mining structures that can be used for various purposes, with focus on ventilation simulations. Together with the quality of the measurement data (diverse point-cloud density, missing samples, holes induced by obstructions in the field of view, measurement noise), this creates problems that require multi-stage processing of the obtained data. The authors propose a robust methodology to process a single segmented section of the mining system. The presented approach focuses on obtaining a point cloud ready for application in the computational fluid dynamics (CFD) analysis of airflow with minimal need for additional manual corrections on the generated mesh model. This requires the point cloud to have evenly distributed points and reduced noise (together with removal of objects inside) while keeping the unique geometrical properties and shape of the scanned tunnels. Proposed methodology uses trajectory of the excavation either obtained during the measurements or by skeletonization process explained in the article. Cross-sections obtained on planes perpendicular to the trajectory are processed towards the equalization of point distribution, removing measurement noise, holes in the point cloud and objects inside the excavation. The effects of the proposed algorithm are validated by comparing the processed cloud with the original cloud and testing within the CFD environment. The algorithm proved high effectiveness in improving skewness rate of the obtained mesh and geometry mapping accuracy (standard deviation below 5 centimeters in cloud-to-mesh comparison).</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 324-343"},"PeriodicalIF":10.6,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization in deep learning-based aircraft classification for SAR imagery 基于深度学习的合成孔径雷达图像飞机分类中的泛化问题
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-08 DOI: 10.1016/j.isprsjprs.2024.10.030
Andrea Pulella , Francescopaolo Sica , Carlos Villamil Lopez , Harald Anglberger , Ronny Hänsch
Automatic Target Recognition (ATR) from Synthetic Aperture Radar (SAR) data covers a wide range of applications. SAR ATR helps to detect and track vehicles and other objects, e.g. in disaster relief and surveillance operations. Aircraft classification covers a significant part of this research area, which differs from other SAR-based ATR tasks, such as ship and ground vehicle detection and classification, in that aircrafts are usually a static target, often remaining at the same location and in a given orientation for longer time frames. Today, there is a significant mismatch between the abundance of deep learning-based aircraft classification models and the availability of corresponding datasets. This mismatch has led to models with improved classification performance on specific datasets, but the challenge of generalizing to conditions not present in the training data (which are expected to occur in operational conditions) has not yet been satisfactorily analyzed. This paper aims to evaluate how classification performance and generalization capabilities of deep learning models are influenced by the diversity of the training dataset. Our goal is to understand the model’s competence and the conditions under which it can achieve proficiency in aircraft classification tasks for high-resolution SAR images while demonstrating generalization capabilities when confronted with novel data that include different geographic locations, environmental conditions, and geometric variations. We address this gap by using manually annotated high-resolution SAR data from TerraSAR-X and TanDEM-X and show how the classification performance changes for different application scenarios requiring different training and evaluation setups. We find that, as expected, the type of aircraft plays a crucial role in the classification problem, since it will vary in shape and dimension. However, these aspects are secondary to how the SAR image is acquired, with the acquisition geometry playing the primary role. Therefore, we find that the characteristics of the acquisition are much more relevant for generalization than the complex geometry of the target. We show this for various models selected among the standard classification algorithms.
合成孔径雷达(SAR)数据的自动目标识别(ATR)应用范围广泛。合成孔径雷达自动目标识别(ATR)有助于探测和跟踪飞行器和其他物体,例如在救灾和监视行动中。飞机分类是这一研究领域的重要组成部分,它不同于其他基于合成孔径雷达的 ATR 任务,如船舶和地面车辆的探测和分类,因为飞机通常是静态目标,经常在同一地点和特定方向停留较长时间。如今,基于深度学习的飞机分类模型的丰富程度与相应数据集的可用性之间存在严重不匹配。这种不匹配导致模型在特定数据集上的分类性能有所提高,但对训练数据中不存在的条件(预计会在运行条件下出现)进行泛化的挑战尚未得到令人满意的分析。本文旨在评估深度学习模型的分类性能和泛化能力如何受到训练数据集多样性的影响。我们的目标是了解模型的能力,以及它在什么条件下可以熟练完成高分辨率合成孔径雷达图像的飞机分类任务,同时在面对包括不同地理位置、环境条件和几何变化在内的新数据时展示泛化能力。我们利用 TerraSAR-X 和 TanDEM-X 人工标注的高分辨率合成孔径雷达数据弥补了这一不足,并展示了在需要不同训练和评估设置的不同应用场景下,分类性能的变化情况。我们发现,正如预期的那样,飞机的类型在分类问题中起着至关重要的作用,因为飞机的形状和尺寸会有所不同。然而,这些方面对于如何获取合成孔径雷达图像是次要的,而获取几何图形才是主要的。因此,我们发现获取图像的特征比目标的复杂几何形状更适合进行归纳。我们从标准分类算法中选择了多种模型来说明这一点。
{"title":"Generalization in deep learning-based aircraft classification for SAR imagery","authors":"Andrea Pulella ,&nbsp;Francescopaolo Sica ,&nbsp;Carlos Villamil Lopez ,&nbsp;Harald Anglberger ,&nbsp;Ronny Hänsch","doi":"10.1016/j.isprsjprs.2024.10.030","DOIUrl":"10.1016/j.isprsjprs.2024.10.030","url":null,"abstract":"<div><div>Automatic Target Recognition (ATR) from Synthetic Aperture Radar (SAR) data covers a wide range of applications. SAR ATR helps to detect and track vehicles and other objects, e.g. in disaster relief and surveillance operations. Aircraft classification covers a significant part of this research area, which differs from other SAR-based ATR tasks, such as ship and ground vehicle detection and classification, in that aircrafts are usually a static target, often remaining at the same location and in a given orientation for longer time frames. Today, there is a significant mismatch between the abundance of deep learning-based aircraft classification models and the availability of corresponding datasets. This mismatch has led to models with improved classification performance on specific datasets, but the challenge of generalizing to conditions not present in the training data (which are expected to occur in operational conditions) has not yet been satisfactorily analyzed. This paper aims to evaluate how classification performance and generalization capabilities of deep learning models are influenced by the diversity of the training dataset. Our goal is to understand the model’s competence and the conditions under which it can achieve proficiency in aircraft classification tasks for high-resolution SAR images while demonstrating generalization capabilities when confronted with novel data that include different geographic locations, environmental conditions, and geometric variations. We address this gap by using manually annotated high-resolution SAR data from TerraSAR-X and TanDEM-X and show how the classification performance changes for different application scenarios requiring different training and evaluation setups. We find that, as expected, the type of aircraft plays a crucial role in the classification problem, since it will vary in shape and dimension. However, these aspects are secondary to how the SAR image is acquired, with the acquisition geometry playing the primary role. Therefore, we find that the characteristics of the acquisition are much more relevant for generalization than the complex geometry of the target. We show this for various models selected among the standard classification algorithms.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 312-323"},"PeriodicalIF":10.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing mangrove species mapping: An innovative approach using Google Earth images and a U-shaped network for individual-level Sonneratia apetala detection 推进红树林物种绘图:利用谷歌地球图像和 U 型网络进行个体级 Sonneratia apetala 检测的创新方法
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-07 DOI: 10.1016/j.isprsjprs.2024.10.016
Chuanpeng Zhao , Yubin Li , Mingming Jia , Chengbin Wu , Rong Zhang , Chunying Ren , Zongming Wang
The exotic mangrove species Sonneratia apetala has been colonizing coastal China for several decades, sparking attention and debates from the public and policy-makers about its reproduction, dispersal, and spread. Existing local-scale studies have relied on fine but expensive data sources to map mangrove species, limiting their applicability for detecting S. apetala in large areas due to cost constraints. A previous study utilized freely available Sentinel-2 images to construct a 10-m-resolution S. apetala map in China but did not capture small clusters of S. apetala due to resolution limitations. To precisely detect S. apetala in coastal China, we proposed an approach that integrates freely accessible submeter-resolution Google Earth images to control expenses, a 10-m-resolution S. apetala map to retrieve well-distributed samples, and several U-shaped networks to capture S. apetala in the form of clusters and individuals. Comparisons revealed that the lite U-squared network was most suitable for detecting S. apetala among the five U-shaped networks. The resulting map achieved an overall accuracy of 98.2 % using testing samples and an accuracy of 91.0 % using field sample plots. Statistics indicated that the total area covered by S. apetala in China was 4000.4 ha in 2022, which was 33.4 % greater than that of the 10-m-resolution map. The excessive area suggested the presence of a large number of small clusters beyond the discrimination capacity of medium-resolution images. Furthermore, the mechanism of the approach was interpreted using an example-based method that altered image color, shape, orientation, and textures. Comparisons showed that textures were the key feature for identifying S. apetala based on submeter-resolution Google Earth images. The detection accuracy rapidly decreased with the blurring of textures, and images at zoom levels of 20, 19, and 18 were applicable to the trained network. Utilizing the first individual-level map, we estimated the number of mature S. apetala trees to be approximately 2.35 million with a 95 % confidence interval between 2.30 and 2.40 million, providing a basis for managing this exotic mangrove species. This study deepens existing research on S. apetala by providing an approach with a clear mechanism, an individual-level distribution with a much larger area, and an estimation of the number of mature trees. This study advances mangrove species mapping by combining the advantages of freely accessible medium- and high-resolution images: the former provides abundant spectral information to integrate discrete local-scale maps to generate a large-scale map, while the latter offers textural information from submeter-resolution Google Earth images to detect mangrove species in detail.
数十年来,外来红树林物种Sonneratia apetala一直在中国沿海定殖,引发了公众和政策制定者对其繁殖、扩散和蔓延的关注和争论。现有的地方尺度研究依赖于精细但昂贵的数据源来绘制红树林物种图谱,由于成本限制,这些研究在大面积地区探测红树林的适用性受到了限制。之前的一项研究利用免费提供的哨兵-2 图像在中国绘制了 10 米分辨率的红树林物种图,但由于分辨率的限制,未能捕捉到红树林物种的小集群。为了在中国沿海地区精确地探测阿普塔拉虫,我们提出了一种方法,将可免费获取的亚米级分辨率谷歌地球(Google Earth)图像与10米分辨率阿普塔拉虫地图相结合,以控制费用;将10米分辨率阿普塔拉虫地图与多个U型网络相结合,以获取分布均匀的阿普塔拉虫样本;将多个U型网络相结合,以捕获阿普塔拉虫集群和个体。比较发现,在五个 U 型网络中,轻型 U 型网络最适合检测 S. apetala。使用测试样本绘制的地图总体准确率为 98.2%,使用实地样地绘制的地图准确率为 91.0%。统计结果表明,2022 年中国阿普塔拉鼠的总覆盖面积为 4000.4 公顷,比 10 米分辨率地图的覆盖面积大 33.4%。面积过大表明存在大量小集群,超出了中等分辨率图像的分辨能力。此外,该方法的机制还通过一种基于实例的方法来解释,该方法改变了图像的颜色、形状、方向和纹理。比较表明,纹理是基于亚米分辨率谷歌地球图像识别 S. apetala 的关键特征。随着纹理的模糊,检测准确率迅速下降,20、19 和 18 缩放级别的图像适用于训练有素的网络。利用第一张个体级地图,我们估计成熟的 S. apetala 树的数量约为 235 万棵,置信区间为 230 万至 240 万棵,为管理这种外来红树林物种提供了依据。这项研究提供了一种具有明确机制的方法、面积更大的个体级分布以及成熟树木数量的估算,从而深化了对 S. apetala 的现有研究。这项研究结合了可免费获取的中分辨率和高分辨率图像的优势,推进了红树林物种绘图工作:前者提供了丰富的光谱信息,可将离散的局部尺度地图整合生成大尺度地图;后者提供了亚米级分辨率谷歌地球图像的纹理信息,可详细探测红树林物种。
{"title":"Advancing mangrove species mapping: An innovative approach using Google Earth images and a U-shaped network for individual-level Sonneratia apetala detection","authors":"Chuanpeng Zhao ,&nbsp;Yubin Li ,&nbsp;Mingming Jia ,&nbsp;Chengbin Wu ,&nbsp;Rong Zhang ,&nbsp;Chunying Ren ,&nbsp;Zongming Wang","doi":"10.1016/j.isprsjprs.2024.10.016","DOIUrl":"10.1016/j.isprsjprs.2024.10.016","url":null,"abstract":"<div><div>The exotic mangrove species <em>Sonneratia apetala</em> has been colonizing coastal China for several decades, sparking attention and debates from the public and policy-makers about its reproduction, dispersal, and spread. Existing local-scale studies have relied on fine but expensive data sources to map mangrove species, limiting their applicability for detecting <em>S. apetala</em> in large areas due to cost constraints. A previous study utilized freely available Sentinel-2 images to construct a 10-m-resolution <em>S. apetala</em> map in China but did not capture small clusters of <em>S. apetala</em> due to resolution limitations. To precisely detect <em>S. apetala</em> in coastal China, we proposed an approach that integrates freely accessible submeter-resolution Google Earth images to control expenses, a 10-m-resolution <em>S. apetala</em> map to retrieve well-distributed samples, and several U-shaped networks to capture <em>S. apetala</em> in the form of clusters and individuals. Comparisons revealed that the lite U-squared network was most suitable for detecting <em>S. apetala</em> among the five U-shaped networks. The resulting map achieved an overall accuracy of 98.2 % using testing samples and an accuracy of 91.0 % using field sample plots. Statistics indicated that the total area covered by <em>S. apetala</em> in China was 4000.4 ha in 2022, which was 33.4 % greater than that of the 10-m-resolution map. The excessive area suggested the presence of a large number of small clusters beyond the discrimination capacity of medium-resolution images. Furthermore, the mechanism of the approach was interpreted using an example-based method that altered image color, shape, orientation, and textures. Comparisons showed that textures were the key feature for identifying <em>S. apetala</em> based on submeter-resolution Google Earth images. The detection accuracy rapidly decreased with the blurring of textures, and images at zoom levels of 20, 19, and 18 were applicable to the trained network. Utilizing the first individual-level map, we estimated the number of mature <em>S. apetala</em> trees to be approximately 2.35 million with a 95 % confidence interval between 2.30 and 2.40 million, providing a basis for managing this exotic mangrove species. This study deepens existing research on <em>S. apetala</em> by providing an approach with a clear mechanism, an individual-level distribution with a much larger area, and an estimation of the number of mature trees. This study advances mangrove species mapping by combining the advantages of freely accessible medium- and high-resolution images: the former provides abundant spectral information to integrate discrete local-scale maps to generate a large-scale map, while the latter offers textural information from submeter-resolution Google Earth images to detect mangrove species in detail.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 276-293"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDRSA-Net: Hybrid dynamic residual self-attention network for SAR-assisted optical image cloud and shadow removal HDRSA-Net:用于合成孔径雷达辅助光学图像云和阴影去除的混合动态残差自注意网络
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-07 DOI: 10.1016/j.isprsjprs.2024.10.026
Jun Pan , Jiangong Xu , Xiaoyu Yu , Guo Ye , Mi Wang , Yumin Chen , Jianshen Ma
Clouds and shadows often contaminate optical remote sensing images, resulting in missing information. Consequently, continuous spatiotemporal monitoring of the Earth’s surface requires the efficient removal of clouds and shadows. Unlike optical satellites, synthetic aperture radar (SAR) has active imaging capabilities in all weather conditions, supplying valuable supplementary information for reconstructing missing regions. Nevertheless, the reconstruction of high-fidelity cloud-free images based on SAR-optical data fusion remains challenging due to differences in imaging mechanisms and the considerable contamination from speckle noise inherent in SAR imagery. To solve the aforementioned challenges, this paper presents a novel hybrid dynamic residual self-attention network (HDRSA-Net), aiming to fully exploit the potential of SAR images in reconstructing missing regions. The proposed HDRSA-Net comprises multiple dynamic interaction residual (DIR) groups organized into an end-to-end trainable deep hierarchical stacked architecture. Specifically, the omni-dimensional dynamic local exploration (ODDLE) module and the sparse global context aggregation (SGCA) module are used to form a local–global feature adaptive extraction and implicit enhancement. A multi-task cooperative optimization loss function is designed to ensure that the results exhibit high spectral fidelity and coherent spatial structures. Additionally, this paper releases a large dataset that can comprehensively evaluate the reconstruction quality under different cloud coverages and various types of ground cover, providing a solid foundation for restoring satisfactory sensory effects and reliable semantic application value. In comparison to the current representative algorithms, the presented approach exhibits effectiveness and advancement in reconstructing missing regions with stability. The project is accessible at: https://github.com/RSIIPAC/LuojiaSET-OSFCR.
云层和阴影经常会污染光学遥感图像,导致信息缺失。因此,对地球表面进行连续时空监测需要有效地去除云层和阴影。与光学卫星不同,合成孔径雷达(SAR)在任何天气条件下都能主动成像,为重建缺失区域提供宝贵的补充信息。然而,由于成像机制的差异以及合成孔径雷达图像固有的斑点噪声的严重污染,基于合成孔径雷达-光学数据融合重建高保真无云图像仍然具有挑战性。为解决上述难题,本文提出了一种新型混合动态残差自注意网络(HDRSA-Net),旨在充分挖掘合成孔径雷达图像在重建缺失区域方面的潜力。所提出的 HDRSA-Net 由多个动态交互残差(DIR)组组成,形成一个端到端可训练的深度分层堆叠架构。具体来说,全维动态局部探索(ODDLE)模块和稀疏全局上下文聚合(SGCA)模块用于形成局部-全局特征自适应提取和隐式增强。本文设计了一个多任务合作优化损失函数,以确保结果具有高频谱保真度和连贯的空间结构。此外,本文还发布了一个大型数据集,可以全面评估不同云层覆盖和各类地表覆盖下的重建质量,为还原令人满意的感官效果和可靠的语义应用价值奠定了坚实的基础。与目前具有代表性的算法相比,本文提出的方法在稳定重建缺失区域方面表现出了有效性和先进性。该项目可通过以下网址访问:https://github.com/RSIIPAC/LuojiaSET-OSFCR.
{"title":"HDRSA-Net: Hybrid dynamic residual self-attention network for SAR-assisted optical image cloud and shadow removal","authors":"Jun Pan ,&nbsp;Jiangong Xu ,&nbsp;Xiaoyu Yu ,&nbsp;Guo Ye ,&nbsp;Mi Wang ,&nbsp;Yumin Chen ,&nbsp;Jianshen Ma","doi":"10.1016/j.isprsjprs.2024.10.026","DOIUrl":"10.1016/j.isprsjprs.2024.10.026","url":null,"abstract":"<div><div>Clouds and shadows often contaminate optical remote sensing images, resulting in missing information. Consequently, continuous spatiotemporal monitoring of the Earth’s surface requires the efficient removal of clouds and shadows. Unlike optical satellites, synthetic aperture radar (SAR) has active imaging capabilities in all weather conditions, supplying valuable supplementary information for reconstructing missing regions. Nevertheless, the reconstruction of high-fidelity cloud-free images based on SAR-optical data fusion remains challenging due to differences in imaging mechanisms and the considerable contamination from speckle noise inherent in SAR imagery. To solve the aforementioned challenges, this paper presents a novel hybrid dynamic residual self-attention network (HDRSA-Net), aiming to fully exploit the potential of SAR images in reconstructing missing regions. The proposed HDRSA-Net comprises multiple dynamic interaction residual (DIR) groups organized into an end-to-end trainable deep hierarchical stacked architecture. Specifically, the omni-dimensional dynamic local exploration (ODDLE) module and the sparse global context aggregation (SGCA) module are used to form a local–global feature adaptive extraction and implicit enhancement. A multi-task cooperative optimization loss function is designed to ensure that the results exhibit high spectral fidelity and coherent spatial structures. Additionally, this paper releases a large dataset that can comprehensively evaluate the reconstruction quality under different cloud coverages and various types of ground cover, providing a solid foundation for restoring satisfactory sensory effects and reliable semantic application value. In comparison to the current representative algorithms, the presented approach exhibits effectiveness and advancement in reconstructing missing regions with stability. The project is accessible at: <span><span>https://github.com/RSIIPAC/LuojiaSET-OSFCR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 258-275"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-view graph neural network for building age prediction 用于楼龄预测的多视图图神经网络
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-07 DOI: 10.1016/j.isprsjprs.2024.10.011
Yi Wang , Yizhi Zhang , Quanhua Dong , Hao Guo , Yingchun Tao , Fan Zhang
Building age is crucial for inferring building energy consumption and understanding the interactions between human behavior and urban infrastructure. Limited by the challenges of surveys, some machine learning methods have been utilized to predict and fill in missing building age data using building footprint. However, the existing methods lack explicit modeling of spatial effects and semantic relationships between buildings. To alleviate these challenges, we propose a novel multi-view graph neural network called Building Age Prediction Network (BAPN). The features of spatial autocorrelation, spatial heterogeneity and semantic similarity were extracted and integrated using multiple graph convolutional networks. Inspired by the spatial regime model, a heterogeneity-aware graph convolutional network (HGCN) based on spatial grouping is designed to capture the spatial heterogeneity. Systematic experiments on three large-scale building footprint datasets demonstrate that BAPN outperforms existing machine learning and graph learning models, achieving high accuracy ranging from 61% to 80%. Moreover, missing building age data within the Fifth Ring Road of Beijing was filled, validating the feasibility of BAPN. This research offers new insights for filling the intra-city building age gaps and understanding multiple spatial effects essential for sustainable urban planning.
楼龄对于推断建筑能耗和了解人类行为与城市基础设施之间的相互作用至关重要。受限于调查的挑战,一些机器学习方法已被用来预测和利用建筑物足迹填补缺失的楼龄数据。然而,现有方法缺乏对建筑物之间空间效应和语义关系的明确建模。为了缓解这些挑战,我们提出了一种名为建筑年代预测网络(BAPN)的新型多视图图神经网络。我们使用多个图卷积网络提取并整合了空间自相关性、空间异质性和语义相似性特征。受空间制度模型的启发,设计了基于空间分组的异质性感知图卷积网络(HGCN)来捕捉空间异质性。在三个大规模建筑足迹数据集上进行的系统实验表明,BAPN 的表现优于现有的机器学习和图学习模型,达到了 61% 到 80% 的高准确率。此外,北京五环内缺失的楼龄数据也得到了填补,验证了 BAPN 的可行性。这项研究为填补城市内部楼龄缺口和理解可持续城市规划所必需的多重空间效应提供了新的见解。
{"title":"A multi-view graph neural network for building age prediction","authors":"Yi Wang ,&nbsp;Yizhi Zhang ,&nbsp;Quanhua Dong ,&nbsp;Hao Guo ,&nbsp;Yingchun Tao ,&nbsp;Fan Zhang","doi":"10.1016/j.isprsjprs.2024.10.011","DOIUrl":"10.1016/j.isprsjprs.2024.10.011","url":null,"abstract":"<div><div>Building age is crucial for inferring building energy consumption and understanding the interactions between human behavior and urban infrastructure. Limited by the challenges of surveys, some machine learning methods have been utilized to predict and fill in missing building age data using building footprint. However, the existing methods lack explicit modeling of spatial effects and semantic relationships between buildings. To alleviate these challenges, we propose a novel multi-view graph neural network called Building Age Prediction Network (BAPN). The features of spatial autocorrelation, spatial heterogeneity and semantic similarity were extracted and integrated using multiple graph convolutional networks. Inspired by the spatial regime model, a heterogeneity-aware graph convolutional network (HGCN) based on spatial grouping is designed to capture the spatial heterogeneity. Systematic experiments on three large-scale building footprint datasets demonstrate that BAPN outperforms existing machine learning and graph learning models, achieving high accuracy ranging from 61% to 80%. Moreover, missing building age data within the Fifth Ring Road of Beijing was filled, validating the feasibility of BAPN. This research offers new insights for filling the intra-city building age gaps and understanding multiple spatial effects essential for sustainable urban planning.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 294-311"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating synthetic datasets with CLIP semantic insights for single image localization advancements 将合成数据集与 CLIP 语义见解相结合,推动单图像定位技术的发展
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-06 DOI: 10.1016/j.isprsjprs.2024.10.027
Dansheng Yao , Mengqi Zhu , Hehua Zhu , Wuqiang Cai , Long Zhou
Accurate localization of pedestrians and mobile robots is critical for navigation, emergency response, and autonomous driving. Traditional localization methods, such as satellite signals, often prove ineffective in certain environments, and acquiring sufficient positional data can be challenging. Single image localization techniques have been developed to address these issues. However, current deep learning frameworks for single image localization that rely on domain adaptation fail to effectively utilize semantically rich high-level features obtained from large-scale pretraining. This paper introduces a novel framework that leverages the Contrastive Language-Image Pre-training model and prompts to enhance feature extraction and domain adaptation through semantic information. The proposed framework generates an integrated score map from scene-specific prompts to guide feature extraction and employs adversarial components to facilitate domain adaptation. Furthermore, a reslink component is incorporated to mitigate the precision loss in high-level features compared to the original data. Experimental results demonstrate that the use of prompts reduces localization errors by 26.4 % in indoor environments and 24.3 % in outdoor settings. The model achieves localization errors as low as 0.75 m and 8.09 degrees indoors, and 4.56 m and 7.68 degrees outdoors. Analysis of prompts from labeled datasets confirms the model’s capability to effectively interpret scene information. The weights of the integrated score map enhance the model’s transparency, thereby improving interpretability. This study underscores the efficacy of integrating semantic information into image localization tasks.
行人和移动机器人的精确定位对于导航、应急响应和自动驾驶至关重要。传统的定位方法(如卫星信号)在某些环境下往往无效,而且获取足够的定位数据也具有挑战性。为了解决这些问题,人们开发了单图像定位技术。然而,目前依赖于域自适应的单图像定位深度学习框架未能有效利用从大规模预训练中获得的语义丰富的高级特征。本文介绍了一种新颖的框架,该框架利用对比语言-图像预训练模型和提示,通过语义信息加强特征提取和域适应。所提出的框架从特定场景的提示中生成综合分数图,以指导特征提取,并采用对抗组件来促进领域适应。此外,还加入了重链接组件,以减轻与原始数据相比高级特征的精度损失。实验结果表明,在室内环境中,使用提示可将定位误差降低 26.4%,在室外环境中降低 24.3%。该模型的定位误差在室内可低至 0.75 米和 8.09 度,在室外可低至 4.56 米和 7.68 度。对标注数据集的提示分析证实了该模型有效解释场景信息的能力。综合评分图的权重增强了模型的透明度,从而提高了可解释性。这项研究强调了将语义信息整合到图像定位任务中的功效。
{"title":"Integrating synthetic datasets with CLIP semantic insights for single image localization advancements","authors":"Dansheng Yao ,&nbsp;Mengqi Zhu ,&nbsp;Hehua Zhu ,&nbsp;Wuqiang Cai ,&nbsp;Long Zhou","doi":"10.1016/j.isprsjprs.2024.10.027","DOIUrl":"10.1016/j.isprsjprs.2024.10.027","url":null,"abstract":"<div><div>Accurate localization of pedestrians and mobile robots is critical for navigation, emergency response, and autonomous driving. Traditional localization methods, such as satellite signals, often prove ineffective in certain environments, and acquiring sufficient positional data can be challenging. Single image localization techniques have been developed to address these issues. However, current deep learning frameworks for single image localization that rely on domain adaptation fail to effectively utilize semantically rich high-level features obtained from large-scale pretraining. This paper introduces a novel framework that leverages the Contrastive Language-Image Pre-training model and prompts to enhance feature extraction and domain adaptation through semantic information. The proposed framework generates an integrated score map from scene-specific prompts to guide feature extraction and employs adversarial components to facilitate domain adaptation. Furthermore, a reslink component is incorporated to mitigate the precision loss in high-level features compared to the original data. Experimental results demonstrate that the use of prompts reduces localization errors by 26.4 % in indoor environments and 24.3 % in outdoor settings. The model achieves localization errors as low as 0.75 m and 8.09 degrees indoors, and 4.56 m and 7.68 degrees outdoors. Analysis of prompts from labeled datasets confirms the model’s capability to effectively interpret scene information. The weights of the integrated score map enhance the model’s transparency, thereby improving interpretability. This study underscores the efficacy of integrating semantic information into image localization tasks.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 198-213"},"PeriodicalIF":10.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation 选择性加权最小二乘法和片断双线性变换用于精确生成卫星 DSM
IF 10.6 1区 地球科学 Q1 GEOGRAPHY, PHYSICAL Pub Date : 2024-11-06 DOI: 10.1016/j.isprsjprs.2024.11.001
Nazila Mohammadi, Amin Sedaghat
One of the main products of multi-view stereo (MVS) high-resolution satellite (HRS) images in photogrammetry and remote sensing is digital surface model (DSM). Producing DSMs from MVS HRS images still faces serious challenges due to various reasons such as complexity of imaging geometry and exterior orientation model in HRS, as well as large dimensions and various geometric and illumination variations. The main motivation for conducting this research is to provide a novel and efficient method that enhances the accuracy and completeness of extracting DSM from HRS images compared to existing recent methods. The proposed method called Sat-DSM, consists of five main stages. Initially, a very dense set of tie-points is extracted from the images using a tile-based matching method, phase congruency-based feature detectors and descriptors, and a local geometric consistency correspondence method. Then, the process of Rational Polynomial Coefficients (RPC) block adjustment is performed to compensate the RPC bias errors. After that, a dense matching process is performed to generate 3D point clouds for each pair of input HRS images using a new geometric transformation called PWB (pricewise bilinear) and an accurate area-based matching method called SWLSM (selective weighted least square matching). The key innovations of this research include the introduction of SWLSM and PWB methods for an accurate dense matching process. The PWB is a novel and simple piecewise geometric transformation model based on superpixel over-segmentation that has been proposed for accurate registration of each pair of HRS images. The SWLSM matching method is based on phase congruency measure and a selection strategy to improve the well-known LSM (least square matching) performance. After dense matching process, the final stage is spatial intersection to generate 3D point clouds, followed by elevation interpolation to produce DSM. To evaluate the Sat-DSM method, 12 sets of MVS-HRS data from IRS-P5, ZY3-1, ZY3-2, and Worldview-3 sensors were selected from areas with different landscapes such as urban, mountainous, and agricultural areas. The results indicate the superiority of the proposed Sat-DSM method over four other methods CATALYST, SGM (Semi-global matching), SS-DSM (structural similarity based DSM extraction), and Sat-MVSF in terms of completeness, RMSE, and MEE. The demo code is available at https://www.researchgate.net/publication/377721674_SatDSM.
在摄影测量和遥感领域,多视角立体(MVS)高分辨率卫星(HRS)图像的主要产品之一是数字表面模型(DSM)。由于多视角立体高分辨率卫星图像的成像几何和外部方位模型复杂、尺寸大、几何和光照变化多样等原因,利用多视角立体高分辨率卫星图像制作数字表面模型仍面临严峻挑战。开展这项研究的主要动机是提供一种新颖高效的方法,与现有的最新方法相比,提高从 HRS 图像中提取 DSM 的准确性和完整性。所提出的方法称为 Sat-DSM,包括五个主要阶段。首先,使用基于瓦片的匹配方法、基于相位一致性的特征检测器和描述符以及局部几何一致性对应方法,从图像中提取非常密集的连接点集。然后,执行有理多项式系数(RPC)块调整过程,以补偿 RPC 偏差误差。然后,使用一种名为 PWB(价格双线性)的新几何变换和一种名为 SWLSM(选择性加权最小平方匹配)的基于区域的精确匹配方法,执行密集匹配过程,为每对输入 HRS 图像生成三维点云。这项研究的主要创新包括引入 SWLSM 和 PWB 方法来实现精确的密集匹配过程。PWB 是一种基于超像素过度分割的新颖而简单的片状几何变换模型,用于准确配准每对 HRS 图像。SWLSM 匹配方法基于相位一致性度量和选择策略,以提高众所周知的 LSM(最小平方匹配)性能。在密集匹配过程之后,最后一个阶段是空间相交,生成三维点云,然后进行高程插值,生成 DSM。为了评估 Sat-DSM 方法,从 IRS-P5、ZY3-1、ZY3-2 和 Worldview-3 传感器上选取了 12 组 MVS-HRS 数据,这些数据来自城市、山区和农业区等不同地貌的地区。结果表明,就完整性、均方根误差和 MEE 而言,拟议的 Sat-DSM 方法优于 CATALYST、SGM(半全局匹配)、SS-DSM(基于结构相似性的 DSM 提取)和 Sat-MVSF 等其他四种方法。演示代码见 https://www.researchgate.net/publication/377721674_SatDSM。
{"title":"Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation","authors":"Nazila Mohammadi,&nbsp;Amin Sedaghat","doi":"10.1016/j.isprsjprs.2024.11.001","DOIUrl":"10.1016/j.isprsjprs.2024.11.001","url":null,"abstract":"<div><div>One of the main products of multi-view stereo (MVS) high-resolution satellite (HRS) images in photogrammetry and remote sensing is digital surface model (DSM). Producing DSMs from MVS HRS images still faces serious challenges due to various reasons such as complexity of imaging geometry and exterior orientation model in HRS, as well as large dimensions and various geometric and illumination variations. The main motivation for conducting this research is to provide a novel and efficient method that enhances the accuracy and completeness of extracting DSM from HRS images compared to existing recent methods. The proposed method called Sat-DSM, consists of five main stages. Initially, a very dense set of tie-points is extracted from the images using a tile-based matching method, phase congruency-based feature detectors and descriptors, and a local geometric consistency correspondence method. Then, the process of Rational Polynomial Coefficients (RPC) block adjustment is performed to compensate the RPC bias errors. After that, a dense matching process is performed to generate 3D point clouds for each pair of input HRS images using a new geometric transformation called PWB (pricewise bilinear) and an accurate area-based matching method called SWLSM (selective weighted least square matching). The key innovations of this research include the introduction of SWLSM and PWB methods for an accurate dense matching process. The PWB is a novel and simple piecewise geometric transformation model based on superpixel over-segmentation that has been proposed for accurate registration of each pair of HRS images. The SWLSM matching method is based on phase congruency measure and a selection strategy to improve the well-known LSM (least square matching) performance. After dense matching process, the final stage is spatial intersection to generate 3D point clouds, followed by elevation interpolation to produce DSM. To evaluate the Sat-DSM method, 12 sets of MVS-HRS data from IRS-P5, ZY3-1, ZY3-2, and Worldview-3 sensors were selected from areas with different landscapes such as urban, mountainous, and agricultural areas. The results indicate the superiority of the proposed Sat-DSM method over four other methods CATALYST, SGM (Semi-global matching), SS-DSM (structural similarity based DSM extraction), and Sat-MVSF in terms of completeness, RMSE, and MEE. The demo code is available at <span><span>https://www.researchgate.net/publication/377721674_SatDSM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 214-230"},"PeriodicalIF":10.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ISPRS Journal of Photogrammetry and Remote Sensing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1