Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.023
Bing Han , Tengteng Qu , Jie Jiang
Owing to the difficulty of utilizing hidden spatio-temporal information, spatio-temporal knowledge graph (KG) reasoning tasks in real geographic environments have issues of low accuracy and poor interpretability. This paper proposes a grid neighborhood-based graph convolutional network (GN-GCN) for spatio-temporal KG reasoning. Based on the discretized process of encoding spatio-temporal data through the GeoSOT global grid model, the GN-GCN consists of three parts: a static graph neural network, a neighborhood grid calculation, and a time evolution unit, which can learn semantic knowledge, spatial knowledge, and temporal knowledge, respectively. The GN-GCN can also improve the training accuracy and efficiency of the model through the multiscale aggregation characteristic of GeoSOT and can visualize different probabilities in a spatio-temporal intentional probabilistic grid map. Compared with other existing models (RE-GCN, CyGNet, RE-NET, etc.), the mean reciprocal rank (MRR) of GN-GCN reaches 48.33 and 54.06 in spatio-temporal entity and relation prediction tasks, increased by 6.32/18.16% and 6.64/15.67% respectively, which achieves state-of-the-art (SOTA) results in spatio-temporal reasoning. The source code of the project is available at https://doi.org/10.18170/DVN/UIS4VC.
{"title":"GN-GCN: Grid neighborhood-based graph convolutional network for spatio-temporal knowledge graph reasoning","authors":"Bing Han , Tengteng Qu , Jie Jiang","doi":"10.1016/j.isprsjprs.2025.01.023","DOIUrl":"10.1016/j.isprsjprs.2025.01.023","url":null,"abstract":"<div><div>Owing to the difficulty of utilizing hidden spatio-temporal information, spatio-temporal knowledge graph (KG) reasoning tasks in real geographic environments have issues of low accuracy and poor interpretability. This paper proposes a grid neighborhood-based graph convolutional network (GN-GCN) for spatio-temporal KG reasoning. Based on the discretized process of encoding spatio-temporal data through the GeoSOT global grid model, the GN-GCN consists of three parts: a static graph neural network, a neighborhood grid calculation, and a time evolution unit, which can learn semantic knowledge, spatial knowledge, and temporal knowledge, respectively. The GN-GCN can also improve the training accuracy and efficiency of the model through the multiscale aggregation characteristic of GeoSOT and can visualize different probabilities in a spatio-temporal intentional probabilistic grid map. Compared with other existing models (RE-GCN, CyGNet, RE-NET, etc.), the mean reciprocal rank (MRR) of GN-GCN reaches 48.33 and 54.06 in spatio-temporal entity and relation prediction tasks, increased by 6.32/18.16% and 6.64/15.67% respectively, which achieves state-of-the-art (SOTA) results in spatio-temporal reasoning. The source code of the project is available at <span><span>https://doi.org/10.18170/DVN/UIS4VC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 728-739"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.017
Zijie Wang , Jizheng Yi , Aibin Chen , Lijiang Chen , Hui Lin , Kai Xu
Very High-Resolution (VHR) urban remote sensing images segmentation is widely used in ecological environmental protection, urban dynamic monitoring, fine urban management and other related fields. However, the large-scale variation and discrete distribution of objects in VHR images presents a significant challenge to accurate segmentation. The existing studies have primarily concentrated on the internal correlations within a single features, while overlooking the inherent sequential relationships across different feature state. In this paper, a novel Urban Spatial Segmentation Framework (UrbanSSF) is proposed, which fully considers the connections between feature states at different phases. Specifically, the Feature State Interaction (FSI) Mamba with powerful sequence modeling capabilities is designed based on state space modules. It effectively facilitates interactions between the information across different features. Given the disparate semantic information and spatial details of features at different scales, a Global Semantic Enhancer (GSE) module and a Spatial Interactive Attention (SIA) mechanism are designed. The GSE module operates on the high-level features, while the SIA mechanism processes the middle and low-level features. To address the computational challenges of large-scale dense feature fusion, a Channel Space Reconstruction (CSR) algorithm is proposed. This algorithm effectively reduces the computational burden while ensuring efficient processing and maintaining accuracy. In addition, the lightweight UrbanSSF-T, the efficient UrbanSSF-S and the accurate UrbanSSF-L are designed to meet different application requirements in urban scenarios. Comprehensive experiments on the UAVid, ISPRS Vaihingen and Potsdam datasets validate the superior performance of UrbanSSF series. Especially, the UrbanSSF-L achieves a mean intersection over union of 71.0% on the UAVid dataset. Code is available at https://github.com/KotlinWang/UrbanSSF.
{"title":"Accurate semantic segmentation of very high-resolution remote sensing images considering feature state sequences: From benchmark datasets to urban applications","authors":"Zijie Wang , Jizheng Yi , Aibin Chen , Lijiang Chen , Hui Lin , Kai Xu","doi":"10.1016/j.isprsjprs.2025.01.017","DOIUrl":"10.1016/j.isprsjprs.2025.01.017","url":null,"abstract":"<div><div>Very High-Resolution (VHR) urban remote sensing images segmentation is widely used in ecological environmental protection, urban dynamic monitoring, fine urban management and other related fields. However, the large-scale variation and discrete distribution of objects in VHR images presents a significant challenge to accurate segmentation. The existing studies have primarily concentrated on the internal correlations within a single features, while overlooking the inherent sequential relationships across different feature state. In this paper, a novel Urban Spatial Segmentation Framework (UrbanSSF) is proposed, which fully considers the connections between feature states at different phases. Specifically, the Feature State Interaction (FSI) Mamba with powerful sequence modeling capabilities is designed based on state space modules. It effectively facilitates interactions between the information across different features. Given the disparate semantic information and spatial details of features at different scales, a Global Semantic Enhancer (GSE) module and a Spatial Interactive Attention (SIA) mechanism are designed. The GSE module operates on the high-level features, while the SIA mechanism processes the middle and low-level features. To address the computational challenges of large-scale dense feature fusion, a Channel Space Reconstruction (CSR) algorithm is proposed. This algorithm effectively reduces the computational burden while ensuring efficient processing and maintaining accuracy. In addition, the lightweight UrbanSSF-T, the efficient UrbanSSF-S and the accurate UrbanSSF-L are designed to meet different application requirements in urban scenarios. Comprehensive experiments on the UAVid, ISPRS Vaihingen and Potsdam datasets validate the superior performance of UrbanSSF series. Especially, the UrbanSSF-L achieves a mean intersection over union of 71.0% on the UAVid dataset. Code is available at <span><span>https://github.com/KotlinWang/UrbanSSF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 824-840"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.001
Ce Wang, Wanjie Sun
In the realm of remote sensing, images captured by different platforms exhibit significant disparities in spatial resolution. Consequently, effective large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit. However, existing methods confront challenges such as semantic inaccuracies and blurry textures in the reconstructed images. To tackle these issues, we introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution. The framework exploits a pre-trained generative model as a prior to generate perceptually plausible high-resolution (HR) images, thereby constraining the solution space and mitigating texture blurriness. We further enhance the reconstruction by incorporating vector maps, which carry structural and semantic cues to enhance the reconstruction fidelity of ground objects. Moreover, pixel-level inconsistencies in paired remote sensing images, stemming from sensor-specific imaging characteristics, may hinder the convergence of the model and the diversity in generated results. To address this problem, we develop a method to extract sensor-specific imaging characteristics and model the distribution of them. The proposed model can decouple imaging characteristics from image content, allowing it to generate diverse super-resolution images based on imaging characteristics provided by reference satellite images or sampled from the imaging characteristic probability distributions. To validate and evaluate our approach, we create the Cross-Modal Super-Resolution Dataset (CMSRD). Qualitative and quantitative experiments on CMSRD showcase the superiority and broad applicability of our method. Experimental results on downstream vision tasks also demonstrate the utilitarian of the generated SR images. The dataset and code will be publicly available at https://github.com/wwangcece/SGDM.
{"title":"Semantic guided large scale factor remote sensing image super-resolution with generative diffusion prior","authors":"Ce Wang, Wanjie Sun","doi":"10.1016/j.isprsjprs.2024.12.001","DOIUrl":"10.1016/j.isprsjprs.2024.12.001","url":null,"abstract":"<div><div>In the realm of remote sensing, images captured by different platforms exhibit significant disparities in spatial resolution. Consequently, effective large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit. However, existing methods confront challenges such as semantic inaccuracies and blurry textures in the reconstructed images. To tackle these issues, we introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution. The framework exploits a pre-trained generative model as a prior to generate perceptually plausible high-resolution (HR) images, thereby constraining the solution space and mitigating texture blurriness. We further enhance the reconstruction by incorporating vector maps, which carry structural and semantic cues to enhance the reconstruction fidelity of ground objects. Moreover, pixel-level inconsistencies in paired remote sensing images, stemming from sensor-specific imaging characteristics, may hinder the convergence of the model and the diversity in generated results. To address this problem, we develop a method to extract sensor-specific imaging characteristics and model the distribution of them. The proposed model can decouple imaging characteristics from image content, allowing it to generate diverse super-resolution images based on imaging characteristics provided by reference satellite images or sampled from the imaging characteristic probability distributions. To validate and evaluate our approach, we create the Cross-Modal Super-Resolution Dataset (CMSRD). Qualitative and quantitative experiments on CMSRD showcase the superiority and broad applicability of our method. Experimental results on downstream vision tasks also demonstrate the utilitarian of the generated SR images. The dataset and code will be publicly available at <span><span>https://github.com/wwangcece/SGDM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 125-138"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Synthetic Aperture Radar tomography (TomoSAR) has garnered significant interest for its capability to achieve three-dimensional resolution along the elevation angle by collecting a stack of SAR images from different cross-track angles. Compressed Sensing (CS) algorithms have been widely introduced into SAR tomography. However, traditional CS-based TomoSAR methods suffer from weaknesses in noise resistance, high computational complexity, and insufficient super-resolution capabilities. Addressing the efficient TomoSAR imaging problem, this paper proposes an end-to-end neural network-based TomoSAR inversion method, named Multi-Label Classification-based Sparse Imaging Network (MLC-net). MLC-net focuses on the l0 norm optimization problem, completely departing from the iterative framework of traditional compressed sensing methods and overcoming the limitations imposed by the l1 norm optimization problem on signal coherence. Simultaneously, the concept of multi-label classification is introduced for the first time in TomoSAR inversion, enabling MLC-net to accurately invert scenarios with multiple scatterers within the same range-azimuth cell. Additionally, a novel evaluation system for TomoSAR inversion results is introduced, transforming inversion results into a 3D point cloud and utilizing mature evaluation methods for 3D point clouds. Under the new evaluation system, the proposed method is more than 30% stronger than existing methods. Finally, by training solely on simulated data, we conducted extensive experimental testing on both simulated and real data, achieving excellent results that validate the effectiveness, efficiency, and robustness of the proposed method. Specifically, the VQA_PC score improved from 91.085 to 92.713. The code of our network is available in https://github.com/OscarYoungDepend/MLC-net.
{"title":"MLC-net: A sparse reconstruction network for TomoSAR imaging based on multi-label classification neural network","authors":"Depeng Ouyang , Yueting Zhang , Jiayi Guo , Guangyao Zhou","doi":"10.1016/j.isprsjprs.2024.11.018","DOIUrl":"10.1016/j.isprsjprs.2024.11.018","url":null,"abstract":"<div><div>Synthetic Aperture Radar tomography (TomoSAR) has garnered significant interest for its capability to achieve three-dimensional resolution along the elevation angle by collecting a stack of SAR images from different cross-track angles. Compressed Sensing (CS) algorithms have been widely introduced into SAR tomography. However, traditional CS-based TomoSAR methods suffer from weaknesses in noise resistance, high computational complexity, and insufficient super-resolution capabilities. Addressing the efficient TomoSAR imaging problem, this paper proposes an end-to-end neural network-based TomoSAR inversion method, named Multi-Label Classification-based Sparse Imaging Network (MLC-net). MLC-net focuses on the l0 norm optimization problem, completely departing from the iterative framework of traditional compressed sensing methods and overcoming the limitations imposed by the l1 norm optimization problem on signal coherence. Simultaneously, the concept of multi-label classification is introduced for the first time in TomoSAR inversion, enabling MLC-net to accurately invert scenarios with multiple scatterers within the same range-azimuth cell. Additionally, a novel evaluation system for TomoSAR inversion results is introduced, transforming inversion results into a 3D point cloud and utilizing mature evaluation methods for 3D point clouds. Under the new evaluation system, the proposed method is more than 30% stronger than existing methods. Finally, by training solely on simulated data, we conducted extensive experimental testing on both simulated and real data, achieving excellent results that validate the effectiveness, efficiency, and robustness of the proposed method. Specifically, the VQA_PC score improved from 91.085 to 92.713. The code of our network is available in <span><span>https://github.com/OscarYoungDepend/MLC-net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 85-99"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142823156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-30DOI: 10.1016/j.isprsjprs.2024.11.013
Hongwei Liu, Guoqi Xu, Bo Liu, Yuanxin Li, Shuhang Yang, Jie Tang, Kai Pan, Yanqiu Xing
The accurate positioning of individual trees, the reconstruction of forest environment in three dimensions and the identification of tree species distribution are crucial aspects of forestry remote sensing. Simultaneous Localization and Mapping (SLAM) algorithms, primarily based on LiDAR or visual technologies, serve as essential tools for outdoor spatial positioning and mapping, overcoming signal loss challenges caused by tree canopy obstruction in the Global Navigation Satellite System (GNSS). To address these challenges, a semantic SLAM algorithm called LVI-ObjSemantic is proposed, which integrates visual, LiDAR, IMU and deep learning at the object level. LVI-ObjSemantic is capable of performing individual tree segmentation, localization and tree spices discrimination tasks in forest environment. The proposed Cluster-Block-single and Cluster-Block-global data structures combined with the deep learning model can effectively reduce the cases of misdetection and false detection. Due to the lack of publicly available forest datasets, we chose to validate the proposed algorithm on eight experimental plots. The experimental results indicate that the average root mean square error (RMSE) of the trajectories across the eight plots is 2.7, 2.8, 1.9 and 2.2 times lower than that of LIO-SAM, FAST-LIO2, LVI-SAM and FAST-LIVO, respectively. Additionally, the mean absolute error in tree localization is 0.12 m. Moreover, the mapping drift of the proposed algorithm is consistently lower than that of the aforementioned comparison algorithms.
{"title":"A real time LiDAR-Visual-Inertial object level semantic SLAM for forest environments","authors":"Hongwei Liu, Guoqi Xu, Bo Liu, Yuanxin Li, Shuhang Yang, Jie Tang, Kai Pan, Yanqiu Xing","doi":"10.1016/j.isprsjprs.2024.11.013","DOIUrl":"10.1016/j.isprsjprs.2024.11.013","url":null,"abstract":"<div><div>The accurate positioning of individual trees, the reconstruction of forest environment in three dimensions and the identification of tree species distribution are crucial aspects of forestry remote sensing. Simultaneous Localization and Mapping (SLAM) algorithms, primarily based on LiDAR or visual technologies, serve as essential tools for outdoor spatial positioning and mapping, overcoming signal loss challenges caused by tree canopy obstruction in the Global Navigation Satellite System (GNSS). To address these challenges, a semantic SLAM algorithm called LVI-ObjSemantic is proposed, which integrates visual, LiDAR, IMU and deep learning at the object level. LVI-ObjSemantic is capable of performing individual tree segmentation, localization and tree spices discrimination tasks in forest environment. The proposed Cluster-Block-single and Cluster-Block-global data structures combined with the deep learning model can effectively reduce the cases of misdetection and false detection. Due to the lack of publicly available forest datasets, we chose to validate the proposed algorithm on eight experimental plots. The experimental results indicate that the average root mean square error (RMSE) of the trajectories across the eight plots is 2.7, 2.8, 1.9 and 2.2 times lower than that of LIO-SAM, FAST-LIO2, LVI-SAM and FAST-LIVO, respectively. Additionally, the mean absolute error in tree localization is 0.12 m. Moreover, the mapping drift of the proposed algorithm is consistently lower than that of the aforementioned comparison algorithms.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"219 ","pages":"Pages 71-90"},"PeriodicalIF":10.6,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-29DOI: 10.1016/j.isprsjprs.2024.11.016
Shoujun Jia , Lotte de Vugt , Andreas Mayr , Chun Liu , Martin Rutzinger
3D topographic point cloud change estimation produces fundamental inputs for understanding Earth surface process dynamics. In general, change estimation aims at detecting the largest possible number of points with significance (i.e., difference uncertainty) and quantifying multiple types of topographic changes. However, several complex factors, including the inhomogeneous nature of point cloud data, the high uncertainty in positional changes, and the different types of quantifying difference, pose challenges for the reliable detection and quantification of 3D topographic changes. To address these limitations, the paper proposes a graph comparison-based method to estimate 3D topographic change from point clouds. First, a graph with both location and orientation representation is designed to aggregate local neighbors of topographic point clouds against the disordered and unstructured data nature. Second, the corresponding graphs between two topographic point clouds are identified and compared to quantify the differences and associated uncertainties in both location and orientation features. Particularly, the proposed method unites the significant changes derived from both features (i.e., location and orientation) and captures the location difference (i.e., distance) and the orientation difference (i.e., rotation) for each point with significant change. We tested the proposed method in a mountain region (Sellrain, Tyrol, Austria) covered by three airborne laser scanning point cloud pairs with different point densities and complex topographic changes at intervals of four, six, and ten years. Our method detected significant changes in 91.39 % − 93.03 % of the study area, while a state-of-the-art method (i.e., Multiscale Model-to-Model Cloud Comparison, M3C2) identified 36.81 % − 47.41 % significant changes for the same area. Especially for unchanged building roofs, our method measured lower change magnitudes than M3C2. Looking at the case of shallow landslides, our method identified 84 out of a total of 88 reference landslides by analysing change in distance or rotation. Therefore, our method not only detects a large number of significant changes but also quantifies two types of topographic changes (i.e., distance and rotation), and is more robust against registration errors. It shows large potential for estimation and interpretation of topographic changes in natural environments.
{"title":"Location and orientation united graph comparison for topographic point cloud change estimation","authors":"Shoujun Jia , Lotte de Vugt , Andreas Mayr , Chun Liu , Martin Rutzinger","doi":"10.1016/j.isprsjprs.2024.11.016","DOIUrl":"10.1016/j.isprsjprs.2024.11.016","url":null,"abstract":"<div><div>3D topographic point cloud change estimation produces fundamental inputs for understanding Earth surface process dynamics. In general, change estimation aims at detecting the largest possible number of points with significance (<em>i.e.,</em> difference <span><math><mrow><mo>></mo></mrow></math></span> uncertainty) and quantifying multiple types of topographic changes. However, several complex factors, including the inhomogeneous nature of point cloud data, the high uncertainty in positional changes, and the different types of quantifying difference, pose challenges for the reliable detection and quantification of 3D topographic changes. To address these limitations, the paper proposes a graph comparison-based method to estimate 3D topographic change from point clouds. First, a graph with both location and orientation representation is designed to aggregate local neighbors of topographic point clouds against the disordered and unstructured data nature. Second, the corresponding graphs between two topographic point clouds are identified and compared to quantify the differences and associated uncertainties in both location and orientation features. Particularly, the proposed method unites the significant changes derived from both features (<em>i.e.,</em> location and orientation) and captures the location difference (<em>i.e.,</em> distance) and the orientation difference (<em>i.e.,</em> rotation) for each point with significant change. We tested the proposed method in a mountain region (Sellrain, Tyrol, Austria) covered by three airborne laser scanning point cloud pairs with different point densities and complex topographic changes at intervals of four, six, and ten years. Our method detected significant changes in 91.39 % − 93.03 % of the study area, while a state-of-the-art method (<em>i.e.,</em> Multiscale Model-to-Model Cloud Comparison, M3C2) identified 36.81 % − 47.41 % significant changes for the same area. Especially for unchanged building roofs, our method measured lower change magnitudes than M3C2. Looking at the case of shallow landslides, our method identified 84 out of a total of 88 reference landslides by analysing change in distance or rotation. Therefore, our method not only detects a large number of significant changes but also quantifies two types of topographic changes (<em>i.e.,</em> distance and rotation), and is more robust against registration errors. It shows large potential for estimation and interpretation of topographic changes in natural environments.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"219 ","pages":"Pages 52-70"},"PeriodicalIF":10.6,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1016/j.isprsjprs.2024.11.011
Fengyuan Zhuang , Yizhang Liu , Xiaojie Li , Ji Zhou , Riqing Chen , Lifang Wei , Changcai Yang , Jiayi Ma
Correspondence pruning aims to remove false correspondences (outliers) from an initial putative correspondence set. This process holds significant importance and serves as a fundamental step in various applications within the fields of remote sensing and photogrammetry. The presence of noise, illumination changes, and small overlaps in remote sensing images frequently result in a substantial number of outliers within the initial set, thereby rendering the correspondence pruning notably challenging. Although the spatial consensus of correspondences has been widely used to determine the correctness of each correspondence, achieving uniform consensus can be challenging due to the uneven distribution of correspondences. Existing works have mainly focused on either local or global consensus, with a very small perspective or large perspective, respectively. They often ignore the moderate perspective between local and global consensus, called group consensus, which serves as a buffering organization from local to global consensus, hence leading to insufficient correspondence consensus aggregation. To address this issue, we propose a multi-granularity consensus network (MGCNet) to achieve consensus across regions of different scales, which leverages local, group, and global consensus to accomplish robust and accurate correspondence pruning. Specifically, we introduce a GroupGCN module that randomly divides the initial correspondences into several groups and then focuses on group consensus and acts as a buffer organization from local to global consensus. Additionally, we propose a Multi-level Local Feature Aggregation Module that adapts to the size of the local neighborhood to capture local consensus and a Multi-order Global Feature Module to enhance the richness of the global consensus. Experimental results demonstrate that MGCNet outperforms state-of-the-art methods on various tasks, highlighting the superiority and great generalization of our method. In particular, we achieve 3.95% and 8.5% mAP improvement without RANSAC on the YFCC100M dataset in known and unknown scenes for pose estimation, compared to the second-best models (MSA-LFC and CLNet). Source code: https://github.com/1211193023/MGCNet.
{"title":"MGCNet: Multi-granularity consensus network for remote sensing image correspondence pruning","authors":"Fengyuan Zhuang , Yizhang Liu , Xiaojie Li , Ji Zhou , Riqing Chen , Lifang Wei , Changcai Yang , Jiayi Ma","doi":"10.1016/j.isprsjprs.2024.11.011","DOIUrl":"10.1016/j.isprsjprs.2024.11.011","url":null,"abstract":"<div><div>Correspondence pruning aims to remove false correspondences (outliers) from an initial putative correspondence set. This process holds significant importance and serves as a fundamental step in various applications within the fields of remote sensing and photogrammetry. The presence of noise, illumination changes, and small overlaps in remote sensing images frequently result in a substantial number of outliers within the initial set, thereby rendering the correspondence pruning notably challenging. Although the spatial consensus of correspondences has been widely used to determine the correctness of each correspondence, achieving uniform consensus can be challenging due to the uneven distribution of correspondences. Existing works have mainly focused on either local or global consensus, with a very small perspective or large perspective, respectively. They often ignore the moderate perspective between local and global consensus, called group consensus, which serves as a buffering organization from local to global consensus, hence leading to insufficient correspondence consensus aggregation. To address this issue, we propose a multi-granularity consensus network (MGCNet) to achieve consensus across regions of different scales, which leverages local, group, and global consensus to accomplish robust and accurate correspondence pruning. Specifically, we introduce a GroupGCN module that randomly divides the initial correspondences into several groups and then focuses on group consensus and acts as a buffer organization from local to global consensus. Additionally, we propose a Multi-level Local Feature Aggregation Module that adapts to the size of the local neighborhood to capture local consensus and a Multi-order Global Feature Module to enhance the richness of the global consensus. Experimental results demonstrate that MGCNet outperforms state-of-the-art methods on various tasks, highlighting the superiority and great generalization of our method. In particular, we achieve 3.95% and 8.5% mAP<span><math><mrow><mn>5</mn><mo>°</mo></mrow></math></span> improvement without RANSAC on the YFCC100M dataset in known and unknown scenes for pose estimation, compared to the second-best models (MSA-LFC and CLNet). Source code: https://github.com/1211193023/MGCNet.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"219 ","pages":"Pages 38-51"},"PeriodicalIF":10.6,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1016/j.isprsjprs.2024.10.029
Yongchuan Cui , Peng Liu , Yan Ma , Lajiao Chen , Mengzhen Xu , Xingyan Guo
Pansharpening is a crucial technique in remote sensing for enhancing spatial resolution by fusing low spatial resolution multispectral (LRMS) images with high spatial panchromatic (PAN) images. Existing deep convolutional networks often face challenges in capturing fine details due to the homogeneous operation of convolutional kernels. In this paper, we propose a novel predictive filtering approach for pansharpening to mitigate spectral distortions and spatial degradations. By obtaining predictive filters through the fusion of LRMS and PAN and conducting filtering operations using unique kernels assigned to each pixel, our method reduces information loss significantly. To learn more effective kernels, we propose an effective fine-grained fusion method for LRMS and PAN features, namely element-wise feature mixing. Specifically, features of LRMS and PAN will be exchanged under the guidance of a learned mask. The value of the mask signifies the extent to which the element will be mixed. Extensive experimental results demonstrate that the proposed method achieves better performances than the state-of-the-art models with fewer parameters and lower computations. Visual comparisons indicate that our model pays more attention to details, which further confirms the effectiveness of the proposed fine-grained fusion method. Codes are available at https://github.com/yc-cui/PreMix.
平差是遥感技术中的一项重要技术,通过将低空间分辨率的多光谱(LRMS)图像与高空间分辨率的全色(PAN)图像相融合来提高空间分辨率。由于卷积核的同质化操作,现有的深度卷积网络在捕捉精细细节方面往往面临挑战。在本文中,我们提出了一种用于平锐化的新型预测滤波方法,以减轻光谱失真和空间退化。通过融合 LRMS 和 PAN 获得预测滤波器,并使用分配给每个像素的独特内核进行滤波操作,我们的方法大大减少了信息损失。为了学习更有效的内核,我们提出了一种有效的 LRMS 和 PAN 特征细粒度融合方法,即要素式特征混合。具体来说,LRMS 和 PAN 的特征将在学习到的掩码指导下进行交换。掩码的值表示元素混合的程度。广泛的实验结果表明,与最先进的模型相比,所提出的方法参数更少、计算量更低,却能取得更好的性能。直观比较表明,我们的模型更注重细节,这进一步证实了所提出的细粒度融合方法的有效性。代码见 https://github.com/yc-cui/PreMix。
{"title":"Pansharpening via predictive filtering with element-wise feature mixing","authors":"Yongchuan Cui , Peng Liu , Yan Ma , Lajiao Chen , Mengzhen Xu , Xingyan Guo","doi":"10.1016/j.isprsjprs.2024.10.029","DOIUrl":"10.1016/j.isprsjprs.2024.10.029","url":null,"abstract":"<div><div>Pansharpening is a crucial technique in remote sensing for enhancing spatial resolution by fusing low spatial resolution multispectral (LRMS) images with high spatial panchromatic (PAN) images. Existing deep convolutional networks often face challenges in capturing fine details due to the homogeneous operation of convolutional kernels. In this paper, we propose a novel predictive filtering approach for pansharpening to mitigate spectral distortions and spatial degradations. By obtaining predictive filters through the fusion of LRMS and PAN and conducting filtering operations using unique kernels assigned to each pixel, our method reduces information loss significantly. To learn more effective kernels, we propose an effective fine-grained fusion method for LRMS and PAN features, namely element-wise feature mixing. Specifically, features of LRMS and PAN will be exchanged under the guidance of a learned mask. The value of the mask signifies the extent to which the element will be mixed. Extensive experimental results demonstrate that the proposed method achieves better performances than the state-of-the-art models with fewer parameters and lower computations. Visual comparisons indicate that our model pays more attention to details, which further confirms the effectiveness of the proposed fine-grained fusion method. Codes are available at <span><span>https://github.com/yc-cui/PreMix</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"219 ","pages":"Pages 22-37"},"PeriodicalIF":10.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1016/j.isprsjprs.2024.11.008
Jingwen Wang , Jose Luis Pancorbo , Miguel Quemada , Jiahua Zhang , Yun Bai , Sha Zhang , Shanxin Guo , Jinsong Chen
Timely and accurate information on crop productivity is essential for characterizing crop growing status and guiding adaptive management practices to ensure food security. Terrestrial biosphere models forced by satellite observations (satellite-TBMs) are viewed as robust tools for understanding large-scale agricultural productivity, with distinct advantages of generalized input data requirement and comprehensive representation of carbon–water-energy exchange mechanisms. However, it remains unclear whether these models can maintain consistent accuracy at field scale and provide useful information for farmers to make site-specific management decisions. This study aims to investigate the capability of a satellite-TBM to estimate crop productivity at the granularity of individual fields using harmonized Sentinel-2 and Landsat-8 time series. Emphasis was placed on evaluating the model performance in: (i) representing crop response to the spatially and temporally varying field management practices, and (ii) capturing the variation in crop growth, biomass and yield under complex interactions among crop genotypes, environment, and management conditions. To achieve the first objective, we conducted on-farm experiments with controlled nitrogen (N) fertilization and irrigation treatments to assess the efficacy of using satellite-retrieved leaf area index (LAI) to reflect the effect of management practices in the TBM. For the second objective, we integrated a yield formation module into the satellite-TBM and compared it with the semi-empirical harvest index (HI) method. The model performance was then evaluated under varying conditions using an extensive dataset consisting of observations from four crop species (i.e., soybean, wheat, rice and maize), 42 cultivars and 58 field-years. Results demonstrated that satellite-retrieved LAI effectively captured the effects of N and water supply on crop growth, showing high sensitivity to both the timing and quantity of these inputs. This allowed for a spatiotemporal representation of management impacts, even without prior knowledge of the specific management schedules. The TBM forced by satellite LAI produced consistent biomass dynamics with ground measurements, showing an overall correlation coefficient (R) of 0.93 and a relative root mean square error (RRMSE) of 31.4 %. However, model performance declined from biomass to yield estimation, with the HI-based method (R = 0.80, RRMSE = 23.7 %) outperforming mechanistic modeling of grain filling (R = 0.43, RRMSE = 43.4 %). Model accuracy for winter wheat was lower than that for summer crops such as rice, maize and soybean, suggesting potential underrepresentation of the overwintering processes. This study illustrates the utility of satellite-TBMs in crop productivity estimation at the field level, and identifies existing uncertainties and limitations for future model developments.
{"title":"Field-scale evaluation of a satellite-based terrestrial biosphere model for estimating crop response to management practices and productivity","authors":"Jingwen Wang , Jose Luis Pancorbo , Miguel Quemada , Jiahua Zhang , Yun Bai , Sha Zhang , Shanxin Guo , Jinsong Chen","doi":"10.1016/j.isprsjprs.2024.11.008","DOIUrl":"10.1016/j.isprsjprs.2024.11.008","url":null,"abstract":"<div><div>Timely and accurate information on crop productivity is essential for characterizing crop growing status and guiding adaptive management practices to ensure food security. Terrestrial biosphere models forced by satellite observations (satellite-TBMs) are viewed as robust tools for understanding large-scale agricultural productivity, with distinct advantages of generalized input data requirement and comprehensive representation of carbon–water-energy exchange mechanisms. However, it remains unclear whether these models can maintain consistent accuracy at field scale and provide useful information for farmers to make site-specific management decisions. This study aims to investigate the capability of a satellite-TBM to estimate crop productivity at the granularity of individual fields using harmonized Sentinel-2 and Landsat-8 time series. Emphasis was placed on evaluating the model performance in: (i) representing crop response to the spatially and temporally varying field management practices, and (ii) capturing the variation in crop growth, biomass and yield under complex interactions among crop genotypes, environment, and management conditions. To achieve the first objective, we conducted on-farm experiments with controlled nitrogen (N) fertilization and irrigation treatments to assess the efficacy of using satellite-retrieved leaf area index (LAI) to reflect the effect of management practices in the TBM. For the second objective, we integrated a yield formation module into the satellite-TBM and compared it with the semi-empirical harvest index (HI) method. The model performance was then evaluated under varying conditions using an extensive dataset consisting of observations from four crop species (i.e., soybean, wheat, rice and maize), 42 cultivars and 58 field-years. Results demonstrated that satellite-retrieved LAI effectively captured the effects of N and water supply on crop growth, showing high sensitivity to both the timing and quantity of these inputs. This allowed for a spatiotemporal representation of management impacts, even without prior knowledge of the specific management schedules. The TBM forced by satellite LAI produced consistent biomass dynamics with ground measurements, showing an overall correlation coefficient (R) of 0.93 and a relative root mean square error (RRMSE) of 31.4 %. However, model performance declined from biomass to yield estimation, with the HI-based method (R = 0.80, RRMSE = 23.7 %) outperforming mechanistic modeling of grain filling (R = 0.43, RRMSE = 43.4 %). Model accuracy for winter wheat was lower than that for summer crops such as rice, maize and soybean, suggesting potential underrepresentation of the overwintering processes. This study illustrates the utility of satellite-TBMs in crop productivity estimation at the field level, and identifies existing uncertainties and limitations for future model developments.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"219 ","pages":"Pages 1-21"},"PeriodicalIF":10.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1016/j.isprsjprs.2024.10.028
Zebiao Wu , Patrick Marais , Heinz Rüther
Creating 3D digital models of heritage sites typically involves laser scanning and photogrammetry. Although laser scan-derived point clouds provide detailed geometry, occlusions and hidden areas often lead to gaps. Terrestrial and UAV photography can largely fill these gaps and also enhance definition and accuracy at edges and corners. Historical buildings with complex architectural or decorative details require a systematically planned combination of laser scanning with handheld and UAV photography. High-resolution photography not only enhances the geometry of 3D building models but also improves their texturing. The use of cameras, especially UAV cameras, requires robust viewpoint planning to ensure sufficient coverage of the documented structure whilst minimising viewpoints for efficient image acquisition and processing economy. Determining ideal viewpoints for detailed modelling is challenging. Existing planners, relying on coarse scene proxies, often miss fine structures, significantly restrict the search space of candidate viewpoints and surface targets due to high computational costs, and are sensitive to surface orientation errors, which limits their applicability in complex scenarios. To address these limitations, we propose a strategy for generating sparse viewpoints from point clouds for efficient and accurate UAV-based modelling. Unlike existing planners, our backward visibility approach enables exploration of the camera viewpoint space at low computational cost and does not require surface orientation (normal vector) estimation. We introduce an observability-based planning criterion, a direction diversity-driven reconstructability criterion, which assesses modelling quality by encouraging global diversity in viewing directions, and a coarse-to-fine adaptive viewpoint search approach that builds on these criteria. The approach was validated on a number of complex heritage scenes. It achieves efficient modelling with minimal viewpoints and accurately captures fine structures, like thin spires, that are problematic for other planners. For our test examples, we achieve at least 98% coverage, using significantly fewer viewpoints, and with a consistently high structural similarity across all models.
{"title":"A UAV-based sparse viewpoint planning framework for detailed 3D modelling of cultural heritage monuments","authors":"Zebiao Wu , Patrick Marais , Heinz Rüther","doi":"10.1016/j.isprsjprs.2024.10.028","DOIUrl":"10.1016/j.isprsjprs.2024.10.028","url":null,"abstract":"<div><div>Creating 3D digital models of heritage sites typically involves laser scanning and photogrammetry. Although laser scan-derived point clouds provide detailed geometry, occlusions and hidden areas often lead to gaps. Terrestrial and UAV photography can largely fill these gaps and also enhance definition and accuracy at edges and corners. Historical buildings with complex architectural or decorative details require a systematically planned combination of laser scanning with handheld and UAV photography. High-resolution photography not only enhances the geometry of 3D building models but also improves their texturing. The use of cameras, especially UAV cameras, requires robust viewpoint planning to ensure sufficient coverage of the documented structure whilst minimising viewpoints for efficient image acquisition and processing economy. Determining ideal viewpoints for detailed modelling is challenging. Existing planners, relying on coarse scene proxies, often miss fine structures, significantly restrict the search space of candidate viewpoints and surface targets due to high computational costs, and are sensitive to surface orientation errors, which limits their applicability in complex scenarios. To address these limitations, we propose a strategy for generating sparse viewpoints from point clouds for efficient and accurate UAV-based modelling. Unlike existing planners, our backward visibility approach enables exploration of the camera viewpoint space at low computational cost and does not require surface orientation (normal vector) estimation. We introduce an observability-based planning criterion, a direction diversity-driven reconstructability criterion, which assesses modelling quality by encouraging global diversity in viewing directions, and a coarse-to-fine adaptive viewpoint search approach that builds on these criteria. The approach was validated on a number of complex heritage scenes. It achieves efficient modelling with minimal viewpoints and accurately captures fine structures, like thin spires, that are problematic for other planners. For our test examples, we achieve at least 98% coverage, using significantly fewer viewpoints, and with a consistently high structural similarity across all models.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 555-571"},"PeriodicalIF":10.6,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}