Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at https://github.com/GDAOSU/MCT_NERF.
{"title":"Multi‐tiling neural radiance field (NeRF)—geometric assessment on large‐scale aerial datasets","authors":"Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino","doi":"10.1111/phor.12498","DOIUrl":"https://doi.org/10.1111/phor.12498","url":null,"abstract":"Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/GDAOSU/MCT_NERF\">https://github.com/GDAOSU/MCT_NERF</jats:ext-link>.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thaisa Aline Correia Garcia, Antonio Maria Garcia Tommaselli, Letícia Ferrari Castanheiro, Mariana Batista Campos
The problem of sequential estimation of the exterior orientation of imaging sensors and the three‐dimensional environment reconstruction in real time is commonly known as visual simultaneous localisation and mapping (vSLAM). Omnidirectional optical sensors have been increasingly used in vSLAM solutions, mainly for providing a wider view of the scene, allowing the extraction of more features. However, dealing with unmodelled points in the hyperhemispherical field poses challenges, mainly due to the complex lens geometry entailed in the image formation process. To address these challenges, the use of rigorous photogrammetric models that appropriately handle the geometry of fisheye lens cameras can overcome these challenges. Thus, this study presents a real‐time vSLAM approach for omnidirectional systems adapting ORB‐SLAM with a rigorous projection model (equisolid‐angle). The implementation was conducted on the Nvidia Jetson TX2 board, and the approach was evaluated using hyperhemispherical images captured by a dual‐fisheye camera (Ricoh Theta S) embedded into a mobile backpack platform. The trajectory covered a distance of 140 m, with the approach demonstrating accuracy better than 0.12 m at the beginning and achieving metre‐level accuracy at the end of the trajectory. Additionally, we compared the performance of our proposed approach with a generic model for fisheye lens cameras.
{"title":"A photogrammetric approach for real‐time visual SLAM applied to an omnidirectional system","authors":"Thaisa Aline Correia Garcia, Antonio Maria Garcia Tommaselli, Letícia Ferrari Castanheiro, Mariana Batista Campos","doi":"10.1111/phor.12494","DOIUrl":"https://doi.org/10.1111/phor.12494","url":null,"abstract":"The problem of sequential estimation of the exterior orientation of imaging sensors and the three‐dimensional environment reconstruction in real time is commonly known as visual simultaneous localisation and mapping (vSLAM). Omnidirectional optical sensors have been increasingly used in vSLAM solutions, mainly for providing a wider view of the scene, allowing the extraction of more features. However, dealing with unmodelled points in the hyperhemispherical field poses challenges, mainly due to the complex lens geometry entailed in the image formation process. To address these challenges, the use of rigorous photogrammetric models that appropriately handle the geometry of fisheye lens cameras can overcome these challenges. Thus, this study presents a real‐time vSLAM approach for omnidirectional systems adapting ORB‐SLAM with a rigorous projection model (equisolid‐angle). The implementation was conducted on the Nvidia Jetson TX2 board, and the approach was evaluated using hyperhemispherical images captured by a dual‐fisheye camera (Ricoh Theta S) embedded into a mobile backpack platform. The trajectory covered a distance of 140 m, with the approach demonstrating accuracy better than 0.12 m at the beginning and achieving metre‐level accuracy at the end of the trajectory. Additionally, we compared the performance of our proposed approach with a generic model for fisheye lens cameras.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140838492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Wild, Geert Verhoeven, Rafał Muszyński, Norbert Pfeifer
Graffiti, by their very nature, are ephemeral, sometimes even vanishing before creators finish them. This transience is part of graffiti's allure yet signifies the continuous loss of this often disputed form of cultural heritage. To counteract this, graffiti documentation efforts have steadily increased over the past decade. One of the primary challenges in any documentation endeavour is identifying and recording new creations. Image‐based change detection can greatly help in this process, effectuating more comprehensive documentation, less biased digital safeguarding and improved understanding of graffiti. This paper introduces a novel and largely automated image‐based graffiti change detection method. The methodology uses an incremental structure‐from‐motion approach and synthetic cameras to generate co‐registered graffiti images from different areas. These synthetic images are fed into a hybrid change detection pipeline combining a new pixel‐based change detection method with a feature‐based one. The approach was tested on a large and publicly available reference dataset captured along the Donaukanal (Eng. Danube Canal), one of Vienna's graffiti hotspots. With a precision of 87% and a recall of 77%, the results reveal that the proposed change detection workflow can indicate newly added graffiti in a monitored graffiti‐scape, thus supporting a more comprehensive graffiti documentation.
{"title":"Detecting change in graffiti using a hybrid framework","authors":"Benjamin Wild, Geert Verhoeven, Rafał Muszyński, Norbert Pfeifer","doi":"10.1111/phor.12496","DOIUrl":"https://doi.org/10.1111/phor.12496","url":null,"abstract":"Graffiti, by their very nature, are ephemeral, sometimes even vanishing before creators finish them. This transience is part of graffiti's allure yet signifies the continuous loss of this often disputed form of cultural heritage. To counteract this, graffiti documentation efforts have steadily increased over the past decade. One of the primary challenges in any documentation endeavour is identifying and recording new creations. Image‐based change detection can greatly help in this process, effectuating more comprehensive documentation, less biased digital safeguarding and improved understanding of graffiti. This paper introduces a novel and largely automated image‐based graffiti change detection method. The methodology uses an incremental structure‐from‐motion approach and synthetic cameras to generate co‐registered graffiti images from different areas. These synthetic images are fed into a hybrid change detection pipeline combining a new pixel‐based change detection method with a feature‐based one. The approach was tested on a large and publicly available reference dataset captured along the Donaukanal (Eng. Danube Canal), one of Vienna's graffiti hotspots. With a precision of 87% and a recall of 77%, the results reveal that the proposed change detection workflow can indicate newly added graffiti in a monitored graffiti‐scape, thus supporting a more comprehensive graffiti documentation.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"92 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140665187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.
视觉变换器(ViTs)在视觉任务中表现出色。然而,在应用于遥感图像(RSI)时,现有方法往往需要对 ViT 进行大量修改,才能与卷积神经网络(CNN)相媲美。这一要求极大地阻碍了 ViTs 在地球科学领域的应用,尤其是对那些没有时间进行全面模型重新设计的研究人员而言。为了解决这个问题,我们引入了定量正则化(QR)的概念,旨在提高 ViTs 在 RSI 分类中的性能。量化正则化是一种有效的算法,它能巧妙地处理 RSI 中的领域差异,并能在迁移学习中与任何 ViTs 相结合。我们使用三种 ViT 架构:vanilla ViT、Swin-ViT 和 Next-ViT,在四个数据集上评估了 QR 的有效性:这四个数据集是:AID30、NWPU45、AFGR50 和 UCM21。结果表明,我们的 Next-ViT 模型超越了过去 3 年中发布的 39 种其他先进方法,即使在训练样本数量有限的情况下也能保持强劲的性能。我们还发现,与使用相同骨干网的其他方法相比,我们的 ViT 和 Swin-ViT 在准确性和稳健性方面都有显著提高。我们的研究结果证实,在 RSI 分类方面,无论数据集大小如何,ViT 都能像 CNN 一样有效。我们的方法完全采用开源 ViT 和易于获取的训练策略。因此,我们相信我们的方法可以大大降低地球科学研究人员将 ViT 用于 RSI 应用的门槛。
{"title":"Quantitative regularization in robust vision transformer for remote sensing image classification","authors":"Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang","doi":"10.1111/phor.12489","DOIUrl":"https://doi.org/10.1111/phor.12489","url":null,"abstract":"Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"74 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140659413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmanned aerial vehicle light detection and ranging (UAV‐lidar) and unmanned aerial vehicle (UAV) photogrammetry are currently commonly used surface monitoring technologies. Previous studies have used the two technologies interchangeably and ignored their correlation, or only compared them on a single product. However, there are few quantitative assessments of the differences between these two techniques in monitoring surface deformation and prediction of their application prospects. Therefore, the paper compared the differences between the digital elevation model (DEM) and subsidence basins obtained by the two techniques using Gaussian analysis. The results indicate that the surface DEMs obtained by both the techniques exhibit a high degree of similarity. The statistical analysis of the difference values in the z direction between the two DEMs follows a Gaussian distribution with a standard deviation of less than 0.36 m. When comparing the surface subsidence values monitored by the two techniques, it was found that UAV‐lidar was more sensitive to small‐scale deformation, with a difference range of 0.23–0.44 m compared to photogrammetry. The conclusion provides valuable information regarding the utilisation of multisource monitoring data.
无人飞行器光探测与测距(UAV-lidar)和无人飞行器摄影测量是目前常用的地表监测技术。以往的研究将这两种技术交替使用,忽略了它们之间的相关性,或者只在单一产品上对它们进行比较。然而,对于这两种技术在监测地表变形方面的差异和应用前景的预测,却鲜有定量评估。因此,本文利用高斯分析法比较了两种技术获得的数字高程模型(DEM)和沉降盆地之间的差异。结果表明,两种技术获得的地表 DEM 具有高度相似性。在对两种技术监测到的地表沉降值进行比较时发现,与摄影测量法相比,无人机激光雷达对小尺度变形更为敏感,差值范围为 0.23-0.44 米。这一结论为利用多源监测数据提供了有价值的信息。
{"title":"Comparative analysis of surface deformation monitoring in a mining area based on UAV‐lidar and UAV photogrammetry","authors":"Xilin Zhan, Xingzhong Zhang, Xiao Wang, Xinpeng Diao, Lizhuan Qi","doi":"10.1111/phor.12490","DOIUrl":"https://doi.org/10.1111/phor.12490","url":null,"abstract":"Unmanned aerial vehicle light detection and ranging (UAV‐lidar) and unmanned aerial vehicle (UAV) photogrammetry are currently commonly used surface monitoring technologies. Previous studies have used the two technologies interchangeably and ignored their correlation, or only compared them on a single product. However, there are few quantitative assessments of the differences between these two techniques in monitoring surface deformation and prediction of their application prospects. Therefore, the paper compared the differences between the digital elevation model (DEM) and subsidence basins obtained by the two techniques using Gaussian analysis. The results indicate that the surface DEMs obtained by both the techniques exhibit a high degree of similarity. The statistical analysis of the difference values in the <jats:italic>z</jats:italic> direction between the two DEMs follows a Gaussian distribution with a standard deviation of less than 0.36 m. When comparing the surface subsidence values monitored by the two techniques, it was found that UAV‐lidar was more sensitive to small‐scale deformation, with a difference range of 0.23–0.44 m compared to photogrammetry. The conclusion provides valuable information regarding the utilisation of multisource monitoring data.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baokun Feng, Sheng Nie, Cheng Wang, Jinliang Wang, Xiaohuan Xi, Haoyu Wang, Jieying Lao, Xuebo Yang, Dachao Wang, Yiming Chen, Bo Yang
Accurate and efficient registration of unmanned aerial vehicle light detection and ranging (UAV‐lidar) and terrestrial lidar (T‐lidar) data is crucial for forest structure parameter extraction. This study proposes a novel method based on a starburst pattern for the automatic registration of UAV‐lidar and T‐lidar data in forest scenes. It employs density‐based spatial clustering of applications with noise (DBSCAN) for individual tree identification, constructs starburst patterns separately from both lidar sources, and utilises polar coordinate rotation and matching to achieve coarse registration. Fine registration is achieved using the iterative closest point (ICP) algorithm. Experimental results demonstrate that the starburst‐pattern‐based method achieves the desired registration accuracy (average coarse registration error of 0.157 m). Further optimisation with ICP yields slight improvements with an average fine registration error of 0.149 m. Remarkably, the proposed method is insensitive to the individual tree detection number when exceeding 10, and the tree position error has minimal impact on registration accuracy. Furthermore, our proposed method outperforms two existing methods in T‐lidar and UAV‐lidar registration over forest environments.
{"title":"A novel method based on a starburst pattern to register UAV and terrestrial lidar point clouds in forest environments","authors":"Baokun Feng, Sheng Nie, Cheng Wang, Jinliang Wang, Xiaohuan Xi, Haoyu Wang, Jieying Lao, Xuebo Yang, Dachao Wang, Yiming Chen, Bo Yang","doi":"10.1111/phor.12487","DOIUrl":"https://doi.org/10.1111/phor.12487","url":null,"abstract":"Accurate and efficient registration of unmanned aerial vehicle light detection and ranging (UAV‐lidar) and terrestrial lidar (T‐lidar) data is crucial for forest structure parameter extraction. This study proposes a novel method based on a starburst pattern for the automatic registration of UAV‐lidar and T‐lidar data in forest scenes. It employs density‐based spatial clustering of applications with noise (DBSCAN) for individual tree identification, constructs starburst patterns separately from both lidar sources, and utilises polar coordinate rotation and matching to achieve coarse registration. Fine registration is achieved using the iterative closest point (ICP) algorithm. Experimental results demonstrate that the starburst‐pattern‐based method achieves the desired registration accuracy (average coarse registration error of 0.157 m). Further optimisation with ICP yields slight improvements with an average fine registration error of 0.149 m. Remarkably, the proposed method is insensitive to the individual tree detection number when exceeding 10, and the tree position error has minimal impact on registration accuracy. Furthermore, our proposed method outperforms two existing methods in T‐lidar and UAV‐lidar registration over forest environments.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"97 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Building change detection has various applications, such as urban management and disaster assessment. Along with the exponential growth of remote sensing data and computing power, an increasing number of deep‐learning‐based remote sensing building change detection methods have been proposed in recent years. Objectively, the overwhelming majority of existing methods can perfectly deal with the change detection of low‐rise buildings. By contrast, high‐rise buildings often present a large disparity in multitemporal high‐resolution remote sensing images, which degrades the performance of existing methods dramatically. To alleviate this problem, we propose a disparity‐aware Siamese network for detecting building changes in bi‐temporal high‐resolution remote sensing images. The proposed network utilises a cycle‐alignment module to address the disparity problem at both the image and feature levels. A multi‐task learning framework with joint semantic segmentation and change detection loss is used to train the entire deep network, including the cycle‐alignment module in an end‐to‐end manner. Extensive experiments on three publicly open building change detection datasets demonstrate that our method achieves significant improvements on datasets with severe building disparity and state‐of‐the‐art performance on datasets with minimal building disparity simultaneously.
{"title":"A disparity‐aware Siamese network for building change detection in bi‐temporal remote sensing images","authors":"Yansheng Li, Xinwei Li, Wei Chen, Yongjun Zhang","doi":"10.1111/phor.12495","DOIUrl":"https://doi.org/10.1111/phor.12495","url":null,"abstract":"Building change detection has various applications, such as urban management and disaster assessment. Along with the exponential growth of remote sensing data and computing power, an increasing number of deep‐learning‐based remote sensing building change detection methods have been proposed in recent years. Objectively, the overwhelming majority of existing methods can perfectly deal with the change detection of low‐rise buildings. By contrast, high‐rise buildings often present a large disparity in multitemporal high‐resolution remote sensing images, which degrades the performance of existing methods dramatically. To alleviate this problem, we propose a disparity‐aware Siamese network for detecting building changes in bi‐temporal high‐resolution remote sensing images. The proposed network utilises a cycle‐alignment module to address the disparity problem at both the image and feature levels. A multi‐task learning framework with joint semantic segmentation and change detection loss is used to train the entire deep network, including the cycle‐alignment module in an end‐to‐end manner. Extensive experiments on three publicly open building change detection datasets demonstrate that our method achieves significant improvements on datasets with severe building disparity and state‐of‐the‐art performance on datasets with minimal building disparity simultaneously.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erxin Xie, Na Chen, Genwei Zhang, Jiangtao Peng, Weiwei Sun
Transformer has achieved outstanding performance in hyperspectral image classification (HSIC) thanks to its effectiveness in modelling the long‐term dependence relation. However, most of the existing algorithms combine convolution with transformer and use convolution for spatial–spectral information fusion, which cannot adequately learn the spatial–spectral fusion features of hyperspectral images (HSIs). To mine the rich spatial and spectral features, a two‐branch global spatial–spectral fusion transformer (GSSFT) model is designed in this paper, in which a spatial–spectral information fusion (SSIF) module is designed to fuse features of spectral and spatial branches. For the spatial branch, the local multiscale swin transformer (LMST) module is devised to obtain local–global spatial information of the samples and the background filtering (BF) module is constructed to weaken the weights of irrelevant pixels. The information learned from the spatial branch and the spectral branch is effectively fused to get final classification results. Extensive experiments are conducted on three HSI datasets, and the results of experiments show that the designed GSSFT method performs well compared with the traditional convolutional neural network and transformer‐based methods.
{"title":"Two‐branch global spatial–spectral fusion transformer network for hyperspectral image classification","authors":"Erxin Xie, Na Chen, Genwei Zhang, Jiangtao Peng, Weiwei Sun","doi":"10.1111/phor.12491","DOIUrl":"https://doi.org/10.1111/phor.12491","url":null,"abstract":"Transformer has achieved outstanding performance in hyperspectral image classification (HSIC) thanks to its effectiveness in modelling the long‐term dependence relation. However, most of the existing algorithms combine convolution with transformer and use convolution for spatial–spectral information fusion, which cannot adequately learn the spatial–spectral fusion features of hyperspectral images (HSIs). To mine the rich spatial and spectral features, a two‐branch global spatial–spectral fusion transformer (GSSFT) model is designed in this paper, in which a spatial–spectral information fusion (SSIF) module is designed to fuse features of spectral and spatial branches. For the spatial branch, the local multiscale swin transformer (LMST) module is devised to obtain local–global spatial information of the samples and the background filtering (BF) module is constructed to weaken the weights of irrelevant pixels. The information learned from the spatial branch and the spectral branch is effectively fused to get final classification results. Extensive experiments are conducted on three HSI datasets, and the results of experiments show that the designed GSSFT method performs well compared with the traditional convolutional neural network and transformer‐based methods.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohua Tong, Yi Gao, Zhen Ye, Huan Xie, Peng Chen, Haibo Shi, Ziqi Liu, Xianglei Liu, Yusheng Xu, Rong Huang, Shijie Liu
The dynamic measurement of position and attitude information of a long-distance moving object is a common demand in ground testing of aerospace engineering. Due to the movement from far to near and the limitations of camera resolution, it is necessary to use multi-binocular cameras for segmented observation at different distances. However, achieving accurate and continuous position and attitude estimation is a challenging task. Therefore, this paper proposes a dynamic monitoring technique for long-distance movement based on a multi-binocular videogrammetric system. Aiming to solve the problem that the scale in images changes constantly during the moving process, a scale-adaptive tracking method of circular targets is presented. Bundle adjustment (BA) with joint segments using an adaptive-weighting least-squares strategy is developed to enhance the measurement accuracy. The feasibility and reliability of the proposed technique are validated by a ground testing of relative measurement for spacecraft rendezvous and docking. The experimental results indicate that the proposed technique can obtain the actual motion state of the moving object, with a positioning accuracy of 3.2 mm (root mean square error), which can provide a reliable third-party verification for on-orbit measurement systems in ground testing. Compared with the results of BA with individual segments and vision measurement software PhotoModeler, the accuracy is improved by 45% and 30%, respectively.
对远距离运动物体的位置和姿态信息进行动态测量是航空航天工程地面测试的常见需求。由于从远到近的运动和相机分辨率的限制,有必要使用多双目相机对不同距离进行分段观测。然而,实现精确、连续的位置和姿态估计是一项具有挑战性的任务。因此,本文提出了一种基于多双目视频测量系统的远距离运动动态监测技术。为了解决移动过程中图像尺度不断变化的问题,本文提出了一种圆形目标的尺度自适应跟踪方法。利用自适应加权最小二乘策略开发了关节段束调整(BA),以提高测量精度。通过对航天器交会对接的相对测量进行地面测试,验证了所提技术的可行性和可靠性。实验结果表明,所提出的技术可以获得运动物体的实际运动状态,定位精度为 3.2 毫米(均方根误差),可以为地面测试中的在轨测量系统提供可靠的第三方验证。与使用单个片段和视觉测量软件 PhotoModeler 的 BA 结果相比,精度分别提高了 45% 和 30%。
{"title":"Dynamic measurement of a long-distance moving object using multi-binocular high-speed videogrammetry with adaptive-weighting bundle adjustment","authors":"Xiaohua Tong, Yi Gao, Zhen Ye, Huan Xie, Peng Chen, Haibo Shi, Ziqi Liu, Xianglei Liu, Yusheng Xu, Rong Huang, Shijie Liu","doi":"10.1111/phor.12485","DOIUrl":"https://doi.org/10.1111/phor.12485","url":null,"abstract":"The dynamic measurement of position and attitude information of a long-distance moving object is a common demand in ground testing of aerospace engineering. Due to the movement from far to near and the limitations of camera resolution, it is necessary to use multi-binocular cameras for segmented observation at different distances. However, achieving accurate and continuous position and attitude estimation is a challenging task. Therefore, this paper proposes a dynamic monitoring technique for long-distance movement based on a multi-binocular videogrammetric system. Aiming to solve the problem that the scale in images changes constantly during the moving process, a scale-adaptive tracking method of circular targets is presented. Bundle adjustment (BA) with joint segments using an adaptive-weighting least-squares strategy is developed to enhance the measurement accuracy. The feasibility and reliability of the proposed technique are validated by a ground testing of relative measurement for spacecraft rendezvous and docking. The experimental results indicate that the proposed technique can obtain the actual motion state of the moving object, with a positioning accuracy of 3.2 mm (root mean square error), which can provide a reliable third-party verification for on-orbit measurement systems in ground testing. Compared with the results of BA with individual segments and vision measurement software PhotoModeler, the accuracy is improved by 45% and 30%, respectively.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Innsbruck Summer School of Alpine Research‐Close‐Range Sensing Techniques in Alpine Terrain","authors":"","doi":"10.1111/phor.8_12486","DOIUrl":"https://doi.org/10.1111/phor.8_12486","url":null,"abstract":"","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"224 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}