In real world datasets of aerial images, the objects of interest are often missing, hard to annotate and of varying aspects. The framework of unsupervised Anomaly Detection (AD) is highly relevant in this context, and Variational Autoencoders (VAEs), a family of popular probabilistic models, are often used. We develop on the literature of VAEs for AD in order to take advantage of the particular textures that appear in natural aerial images. More precisely we propose a new VAE model with a Gaussian Random Field (GRF) prior (VAE-GRF), which generalizes the classical VAE model, and we provide the necessary procedures and hypotheses required for the model to be tractable. We show that, under some assumptions, the VAE-GRF largely outperforms the traditional VAE and some other probabilistic models developed for AD. Our results suggest that the VAE-GRF could be used as a relevant VAE baseline in place of the traditional VAE with very limited additional computational cost. We provide competitive results on the MVTec reference dataset for visual inspection, and two other datasets dedicated to the task of unsupervised animal detection in aerial images.
在现实世界的航空图像数据集中,所关注的对象往往是缺失的、难以注释的,而且涉及不同的方面。在这种情况下,无监督异常检测(AD)框架就显得非常重要,而变异自动编码器(VAE)是一种流行的概率模型,经常被使用。为了利用自然航空图像中出现的特殊纹理,我们开发了用于 AD 的变异自动编码器文献。更确切地说,我们提出了一种具有高斯随机场(GRF)先验的新 VAE 模型(VAE-GRF),它是对经典 VAE 模型的概括,我们还提供了使该模型具有可操作性所需的必要程序和假设。我们的研究表明,在某些假设条件下,VAE-GRF 在很大程度上优于传统的 VAE 和其他一些针对 AD 开发的概率模型。我们的研究结果表明,VAE-GRF 可以作为相关的 VAE 基线,取代传统的 VAE,而且额外的计算成本非常有限。我们在 MVTec 视觉检测参考数据集和另外两个专门用于航空图像中无监督动物检测任务的数据集上提供了具有竞争力的结果。
{"title":"Variational Autoencoder with Gaussian Random Field prior: Application to unsupervised animal detection in aerial images","authors":"Hugo Gangloff , Minh-Tan Pham , Luc Courtrai , Sébastien Lefèvre","doi":"10.1016/j.isprsjprs.2024.09.028","DOIUrl":"10.1016/j.isprsjprs.2024.09.028","url":null,"abstract":"<div><div>In real world datasets of aerial images, the objects of interest are often missing, hard to annotate and of varying aspects. The framework of unsupervised Anomaly Detection (AD) is highly relevant in this context, and Variational Autoencoders (VAEs), a family of popular probabilistic models, are often used. We develop on the literature of VAEs for AD in order to take advantage of the particular textures that appear in natural aerial images. More precisely we propose a new VAE model with a Gaussian Random Field (GRF) prior (VAE-GRF), which generalizes the classical VAE model, and we provide the necessary procedures and hypotheses required for the model to be tractable. We show that, under some assumptions, the VAE-GRF largely outperforms the traditional VAE and some other probabilistic models developed for AD. Our results suggest that the VAE-GRF could be used as a relevant VAE baseline in place of the traditional VAE with very limited additional computational cost. We provide competitive results on the MVTec reference dataset for visual inspection, and two other datasets dedicated to the task of unsupervised animal detection in aerial images.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 600-609"},"PeriodicalIF":10.6,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03DOI: 10.1016/j.isprsjprs.2024.09.036
Yangzi Cong , Chi Chen , Bisheng Yang , Ruofei Zhong , Shangzhe Sun , Yuhang Xu , Zhengfei Yan , Xianghong Zou , Zhigang Tu
Light Detection And Ranging (LiDAR) technology has provided an impactful way to capture 3D data. However, consistent mapping in sensing-degenerated and perceptually-limited scenes (e.g. multi-story buildings) or under high dynamic sensor motion (e.g. rotating platform) remains a significant challenge. In this paper, we present OR-LIM, a novel observability-aware LiDAR-inertial-mapping system. Essentially, it combines a robust real-time LiDAR-inertial-odometry (LIO) module with an efficient surfel-map-smoothing (SMS) module that seamlessly optimizes the sensor poses and scene geometry at the same time. To improve robustness, the planar surfels are hierarchically generated and grown from point cloud maps to provide reliable correspondences for fixed-lag optimization. Moreover, the normals of surfels are analyzed for the observability evaluation of each frame. To maintain global consistency, a factor graph is utilized integrating the information from IMU propagation, LIO as well as the SMS. The system is extensively tested on the datasets collected by a low-cost multi-beam LiDAR (MBL) mounted on a rotating platform. The experiments with various settings of sensor motion, conducted on complex multi-story buildings and large-scale outdoor scenes, demonstrate the superior performance of our system over multiple state-of-the-art methods. The improvement of point accuracy reaches 3.39–13.6 % with an average 8.71 % outdoor and correspondingly 1.89–15.88 % with 9.09 % indoor, with reference to the collected Terrestrial Laser Scanning (TLS) map.
{"title":"OR-LIM: Observability-aware robust LiDAR-inertial-mapping under high dynamic sensor motion","authors":"Yangzi Cong , Chi Chen , Bisheng Yang , Ruofei Zhong , Shangzhe Sun , Yuhang Xu , Zhengfei Yan , Xianghong Zou , Zhigang Tu","doi":"10.1016/j.isprsjprs.2024.09.036","DOIUrl":"10.1016/j.isprsjprs.2024.09.036","url":null,"abstract":"<div><div>Light Detection And Ranging (LiDAR) technology has provided an impactful way to capture 3D data. However, consistent mapping in sensing-degenerated and perceptually-limited scenes (e.g. multi-story buildings) or under high dynamic sensor motion (e.g. rotating platform) remains a significant challenge. In this paper, we present OR-LIM, a novel observability-aware LiDAR-inertial-mapping system. Essentially, it combines a robust real-time LiDAR-inertial-odometry (LIO) module with an efficient surfel-map-smoothing (SMS) module that seamlessly optimizes the sensor poses and scene geometry at the same time. To improve robustness, the planar surfels are hierarchically generated and grown from point cloud maps to provide reliable correspondences for fixed-lag optimization. Moreover, the normals of surfels are analyzed for the observability evaluation of each frame. To maintain global consistency, a factor graph is utilized integrating the information from IMU propagation, LIO as well as the SMS. The system is extensively tested on the datasets collected by a low-cost multi-beam LiDAR (MBL) mounted on a rotating platform. The experiments with various settings of sensor motion, conducted on complex multi-story buildings and large-scale outdoor scenes, demonstrate the superior performance of our system over multiple state-of-the-art methods. The improvement of point accuracy reaches 3.39–13.6 % with an average 8.71 % outdoor and correspondingly 1.89–15.88 % with 9.09 % indoor, with reference to the collected Terrestrial Laser Scanning (TLS) map.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 610-627"},"PeriodicalIF":10.6,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-02DOI: 10.1016/j.isprsjprs.2024.09.033
Ke Zhang , Zhaoru Zhang , Jianfeng He , Walker O. Smith , Na Liu , Chengfeng Le
Satellite ocean color observations are extensively utilized in global carbon sink evaluation. However, the valid coverage of chlorophyll-a concentration (Chla, mg m−3) measurements from these observations is severely limited during autumn and winter in high latitude oceans. The high solar zenith angle (SZA) stands as one of the primary contributors to the reduced quality of Chla products in the high-latitude Southern Ocean during these seasons. This study addresses this challenge by employing a random forest-based regression ensemble (RFRE) method to enhance the quality of Moderate Resolution Imaging Spectroradiometer (MODIS) Chla products affected by high SZA conditions. The RFRE model incorporates the color index (CI), band-ratio index (R), SZA, sensor zenith angle (senz), and Rayleigh-corrected reflectance at 869 nm (Rrc(869)) as predictors. The results indicate that the RFRE model significantly increased the MODIS observed Chla coverage (1.03 to 3.24 times) in high-latitude Southern Ocean regions to the quality of standard Chla products. By applying the recovered Chla to re-evaluate the carbon sink in South Ocean, results showed that the Southern Ocean’s ability to absorb carbon dioxide (CO2) in winter has been underestimated (5.9–18.6 Tg C year−1) in previous assessments. This study underscores the significance of improving the Chla products for a more accurate estimation of winter carbon sink in the Southern Ocean.
{"title":"Re-evaluating winter carbon sink in Southern Ocean by recovering MODIS-Aqua chlorophyll-a product at high solar zenith angles","authors":"Ke Zhang , Zhaoru Zhang , Jianfeng He , Walker O. Smith , Na Liu , Chengfeng Le","doi":"10.1016/j.isprsjprs.2024.09.033","DOIUrl":"10.1016/j.isprsjprs.2024.09.033","url":null,"abstract":"<div><div>Satellite ocean color observations are extensively utilized in global carbon sink evaluation. However, the valid coverage of chlorophyll-a concentration (Chla, mg m<sup>−3</sup>) measurements from these observations is severely limited during autumn and winter in high latitude oceans. The high solar zenith angle (SZA) stands as one of the primary contributors to the reduced quality of Chla products in the high-latitude Southern Ocean during these seasons. This study addresses this challenge by employing a random forest-based regression ensemble (RFRE) method to enhance the quality of Moderate Resolution Imaging Spectroradiometer (MODIS) Chla products affected by high SZA conditions. The RFRE model incorporates the color index (CI), band-ratio index (R), SZA, sensor zenith angle (senz), and Rayleigh-corrected reflectance at 869 nm (Rrc(869)) as predictors. The results indicate that the RFRE model significantly increased the MODIS observed Chla coverage (1.03 to 3.24 times) in high-latitude Southern Ocean regions to the quality of standard Chla products. By applying the recovered Chla to re-evaluate the carbon sink in South Ocean, results showed that the Southern Ocean’s ability to absorb carbon dioxide (CO<sub>2</sub>) in winter has been underestimated (5.9–18.6 Tg C year<sup>−1</sup>) in previous assessments. This study underscores the significance of improving the Chla products for a more accurate estimation of winter carbon sink in the Southern Ocean.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 588-599"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-02DOI: 10.1016/j.isprsjprs.2024.09.038
Xiangtian Meng , Yilin Bao , Chong Luo , Xinle Zhang , Huanjun Liu
<div><div>Quantifying and tracking the soil organic carbon (SOC) content is a key step toward long-term terrestrial ecosystem monitoring. Over the past decade, numerous models have been proposed and have achieved promising results for predicting SOC content. However, many of these studies are confined to specific temporal or spatial contexts, neglecting model transferability. Temporal transferability refers to a model’s ability to be applied across different periods, while spatial transferability relates to its applicability across diverse geographic locations for prediction. Therefore, developing a new methodology to establish a prediction model with high spatiotemporal transferability for SOC content is critically important. In this study, two large intercontinental study areas were selected, and measured topsoil (0–20 cm) sample data, 27,059 cloudless Landsat 5/8 images, digital elevation models, and climate data were acquired for 3 periods. Based on these data, monthly average climate data, monthly average data reflecting soil properties, and topography data were calculated as original input (OI) variables. We established an innovative multivariate deep learning model with high spatiotemporal transferability, combining the advantages of attention mechanism, graph neural network, and long short-term memory network model (A-GNN-LSTM). Additionally, the spatiotemporal transferability of A-GNN-LSTM and commonly used prediction models were compared. Finally, the abilities of the OI variables and the OI variables processed by feature engineering (FEI) for different SOC prediction models were explored. The results show that 1) the A-GNN-LSTM that used OI as the input variable was the optimal prediction model (RMSE = 4.86 g kg<sup>−1</sup>, R<sup>2</sup> = 0.81, RPIQ = 2.46, and MAE = 3.78 g kg<sup>−1</sup>) with the highest spatiotemporal transferability. 2) Compared to the temporal transferability of the GNN, the A-GNN-LSTM demonstrates superior temporal transferability (ΔR<sup>2</sup><sub>T</sub> = −0.10 vs. −0.07). Furthermore, compared to the spatial transferability of LSTM, the A-GNN-LSTM shows enhanced spatial transferability (ΔR<sup>2</sup><sub>S</sub> = −0.16 vs. −0.09). These findings strongly suggest that the fusion of geospatial context and temporally dependent information, extracted through the integration of GNN and LSTM models, effectively enhances the spatiotemporal transferability of the models. 3) By introducing the attention mechanism, the weights of different input variables could be calculated, increasing the physical interpretability of the deep learning model. The largest weight was assigned to climate data (39.55 %), and the smallest weight was assigned to vegetation (19.96 %). 4) Among the commonly used prediction models, the deep learning model had higher prediction accuracy (RMSE = 6.64 g kg<sup>−1</sup>, R<sup>2</sup> = 0.64, RPIQ = 1.78, and MAE = 4.78 g kg<sup>−1</sup>) and spatial transferability (ΔRMSE<sub>S</sub> = 1.
量化和跟踪土壤有机碳(SOC)含量是实现长期陆地生态系统监测的关键一步。在过去十年中,人们提出了许多预测土壤有机碳含量的模型,并取得了可喜的成果。然而,这些研究大多局限于特定的时间或空间范围,忽视了模型的可移植性。时间上的可迁移性是指模型在不同时期的应用能力,而空间上的可迁移性则是指模型在不同地理位置的预测应用能力。因此,开发一种新方法来建立一个具有高度时空可转移性的 SOC 含量预测模型至关重要。本研究选取了两个大型洲际研究区,获取了三个时期的表土(0-20 厘米)测量样本数据、27059 幅无云 Landsat 5/8 图像、数字高程模型和气候数据。根据这些数据,计算出月均气候数据、反映土壤特性的月均数据和地形数据,作为原始输入变量(OI)。我们结合注意力机制、图神经网络和长短期记忆网络模型(A-GNN-LSTM)的优势,建立了一个创新的多变量深度学习模型,该模型具有很高的时空转移性。此外,还比较了 A-GNN-LSTM 和常用预测模型的时空转移性。最后,探讨了 OI 变量和经过特征工程(FEI)处理的 OI 变量在不同 SOC 预测模型中的能力。结果表明:1)以 OI 为输入变量的 A-GNN-LSTM 是最优预测模型(RMSE = 4.86 g kg-1,R2 = 0.81,RPIQ = 2.46,MAE = 3.78 g kg-1),具有最高的时空转移性。2) 与 GNN 的时空转移性相比,A-GNN-LSTM 的时空转移性更好(ΔR2T = -0.10 vs. -0.07)。此外,与 LSTM 的空间转移性相比,A-GNN-LSTM 显示出更强的空间转移性(ΔR2S = -0.16 vs. -0.09)。这些发现有力地表明,通过整合 GNN 和 LSTM 模型提取的地理空间上下文和时间相关信息的融合,有效地增强了模型的时空转移性。3) 通过引入注意力机制,可以计算不同输入变量的权重,提高深度学习模型的物理可解释性。其中,气候数据的权重最大(39.55%),植被数据的权重最小(19.96%)。4) 在常用预测模型中,深度学习模型具有更高的预测精度(RMSE = 6.64 g kg-1,R2 = 0.64,RPIQ = 1.78,MAE = 4.78 g kg-1)和空间转移性(ΔRMSES = 1.43 g kg-1,ΔR2S = -0.13,ΔRPIQS = -0.50,ΔMAES = 1.09 g kg-1),线性模型具有更高的时间转移性(ΔRMSET = 1.46 g kg-1,ΔR2T = -0.14,ΔRPIQT = -0.45,ΔMAET = 1.29 g kg-1)。5) 深度学习模型必须使用 OI,而线性和传统机器学习模型必须使用 FEI 才能达到更高的预测精度。本研究在整合多种深度学习模型以建立高时空转移性 SOC 预测模型方面迈出了重要一步。
{"title":"A new methodology for establishing an SOC content prediction model that is spatiotemporally transferable at multidecadal and intercontinental scales","authors":"Xiangtian Meng , Yilin Bao , Chong Luo , Xinle Zhang , Huanjun Liu","doi":"10.1016/j.isprsjprs.2024.09.038","DOIUrl":"10.1016/j.isprsjprs.2024.09.038","url":null,"abstract":"<div><div>Quantifying and tracking the soil organic carbon (SOC) content is a key step toward long-term terrestrial ecosystem monitoring. Over the past decade, numerous models have been proposed and have achieved promising results for predicting SOC content. However, many of these studies are confined to specific temporal or spatial contexts, neglecting model transferability. Temporal transferability refers to a model’s ability to be applied across different periods, while spatial transferability relates to its applicability across diverse geographic locations for prediction. Therefore, developing a new methodology to establish a prediction model with high spatiotemporal transferability for SOC content is critically important. In this study, two large intercontinental study areas were selected, and measured topsoil (0–20 cm) sample data, 27,059 cloudless Landsat 5/8 images, digital elevation models, and climate data were acquired for 3 periods. Based on these data, monthly average climate data, monthly average data reflecting soil properties, and topography data were calculated as original input (OI) variables. We established an innovative multivariate deep learning model with high spatiotemporal transferability, combining the advantages of attention mechanism, graph neural network, and long short-term memory network model (A-GNN-LSTM). Additionally, the spatiotemporal transferability of A-GNN-LSTM and commonly used prediction models were compared. Finally, the abilities of the OI variables and the OI variables processed by feature engineering (FEI) for different SOC prediction models were explored. The results show that 1) the A-GNN-LSTM that used OI as the input variable was the optimal prediction model (RMSE = 4.86 g kg<sup>−1</sup>, R<sup>2</sup> = 0.81, RPIQ = 2.46, and MAE = 3.78 g kg<sup>−1</sup>) with the highest spatiotemporal transferability. 2) Compared to the temporal transferability of the GNN, the A-GNN-LSTM demonstrates superior temporal transferability (ΔR<sup>2</sup><sub>T</sub> = −0.10 vs. −0.07). Furthermore, compared to the spatial transferability of LSTM, the A-GNN-LSTM shows enhanced spatial transferability (ΔR<sup>2</sup><sub>S</sub> = −0.16 vs. −0.09). These findings strongly suggest that the fusion of geospatial context and temporally dependent information, extracted through the integration of GNN and LSTM models, effectively enhances the spatiotemporal transferability of the models. 3) By introducing the attention mechanism, the weights of different input variables could be calculated, increasing the physical interpretability of the deep learning model. The largest weight was assigned to climate data (39.55 %), and the smallest weight was assigned to vegetation (19.96 %). 4) Among the commonly used prediction models, the deep learning model had higher prediction accuracy (RMSE = 6.64 g kg<sup>−1</sup>, R<sup>2</sup> = 0.64, RPIQ = 1.78, and MAE = 4.78 g kg<sup>−1</sup>) and spatial transferability (ΔRMSE<sub>S</sub> = 1.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 531-550"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-02DOI: 10.1016/j.isprsjprs.2024.09.039
Renlian Zhou , Monjee K. Almustafa , Moncef L. Nehdi , Huaizhi Su
Leakage-induced soil erosion poses a major threat to dike failure, particularly during floods. Timely detection and notification of leakage outlets to dike management are crucial for ensuring dike safety. However, manual inspection, the current main approach for identifying leakage outlets, is costly, inefficient, and lacks spatial coverage. To achieve efficient and automatic localization of dike leakage outlets, an innovative strategy combining drones, infrared thermography, and deep learning is presented. Drones are employed for dikes’ surface sensing. Real-time images from these drones are sent to a server where well-trained detectors are deployed. Once a leakage outlet is detected, alarming information is remotely sent to dike managers. To realize this strategy, 4 thermal imagers were employed to image leaking outlets of several models and actual dikes. 9,231 hand-labeled thermal images with 13,387 leaking objects were selected for analysis. 19 detectors were trained using transfer learning. The best detector achieved a mean average precision of 95.8 % on the challenging test set. A full-scale embankment was constructed for leakage outlet detection tests. Various field tests confirmed the efficiency of the proposed leakage outlet localization method. In some tough conditions, the trained detector also evidently outperformed manual judgement. Results indicate that under typical circumstances, the localization error of the proposed method is within 5 m, demonstrating its practical reliability. Finally, the influencing factors and limits of the suggested strategy are thoroughly examined.
{"title":"Automated localization of dike leakage outlets using UAV-borne thermography and YOLO-based object detectors","authors":"Renlian Zhou , Monjee K. Almustafa , Moncef L. Nehdi , Huaizhi Su","doi":"10.1016/j.isprsjprs.2024.09.039","DOIUrl":"10.1016/j.isprsjprs.2024.09.039","url":null,"abstract":"<div><div>Leakage-induced soil erosion poses a major threat to dike failure, particularly during floods. Timely detection and notification of leakage outlets to dike management are crucial for ensuring dike safety. However, manual inspection, the current main approach for identifying leakage outlets, is costly, inefficient, and lacks spatial coverage. To achieve efficient and automatic localization of dike leakage outlets, an innovative strategy combining drones, infrared thermography, and deep learning is presented. Drones are employed for dikes’ surface sensing. Real-time images from these drones are sent to a server where well-trained detectors are deployed. Once a leakage outlet is detected, alarming information is remotely sent to dike managers. To realize this strategy, 4 thermal imagers were employed to image leaking outlets of several models and actual dikes. 9,231 hand-labeled thermal images with 13,387 leaking objects were selected for analysis. 19 detectors were trained using transfer learning. The best detector achieved a mean average precision of 95.8 % on the challenging test set. A full-scale embankment was constructed for leakage outlet detection tests. Various field tests confirmed the efficiency of the proposed leakage outlet localization method. In some tough conditions, the trained detector also evidently outperformed manual judgement. Results indicate that under typical circumstances, the localization error of the proposed method is within 5 m, demonstrating its practical reliability. Finally, the influencing factors and limits of the suggested strategy are thoroughly examined.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 551-573"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-02DOI: 10.1016/j.isprsjprs.2024.09.025
Pan Zhang , Baochai Peng , Chaoran Lu , Quanjin Huang , Dongsheng Liu
Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256 × 256 pixels.
{"title":"ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification","authors":"Pan Zhang , Baochai Peng , Chaoran Lu , Quanjin Huang , Dongsheng Liu","doi":"10.1016/j.isprsjprs.2024.09.025","DOIUrl":"10.1016/j.isprsjprs.2024.09.025","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256 × 256 pixels.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 574-587"},"PeriodicalIF":10.6,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.isprsjprs.2024.09.011
Gengxuan Tian , Junqiao Zhao , Yingfeng Cai , Fenglin Zhang , Xufei Wang , Chen Ye , Sisi Zlatanova , Tiantian Feng
Despite the emergence of various LiDAR-based place recognition methods, the challenge of place recognition failure due to rotation remains critical. Existing studies have attempted to address this limitation through specific training strategies involving data augment and rotation-invariant networks. However, augmenting 3D rotations () is impractical for the former, while the latter primarily focuses on the reduced problem of 2D rotation () invariance. Existing methods targeting rotation invariance suffer from limitations in discriminative capability. In this paper, we propose a novel approach (VNI-Net) based on the Vector Neurons Network (VNN) to achieve rotation invariance. Our method begins by extracting rotation-equivariant features from neighboring points and projecting these low-dimensional features into a high-dimensional space using VNN. We then compute both Euclidean and cosine distances in the rotation-equivariant feature space to obtain rotation-invariant features. Finally, we aggregate these features using generalized-mean (GeM) pooling to generate the global descriptor. To mitigate the significant information loss associated with formulating rotation-invariant features, we propose computing distances between features at different layers within the Euclidean space neighborhood. This approach significantly enhances the discriminability of the descriptors while maintaining computational efficiency. We conduct experiments across multiple publicly available datasets captured with vehicle-mounted, drone-mounted LiDAR sensors and handheld. VNI-Net outperforms baseline methods by up to 15.3% in datasets with rotation, while achieving comparable results with state-of-the-art place recognition methods in datasets with less rotation. Our code is open-sourced at https://github.com/tiev-tongji/VNI-Net.
{"title":"VNI-Net: Vector neurons-based rotation-invariant descriptor for LiDAR place recognition","authors":"Gengxuan Tian , Junqiao Zhao , Yingfeng Cai , Fenglin Zhang , Xufei Wang , Chen Ye , Sisi Zlatanova , Tiantian Feng","doi":"10.1016/j.isprsjprs.2024.09.011","DOIUrl":"10.1016/j.isprsjprs.2024.09.011","url":null,"abstract":"<div><div>Despite the emergence of various LiDAR-based place recognition methods, the challenge of place recognition failure due to rotation remains critical. Existing studies have attempted to address this limitation through specific training strategies involving data augment and rotation-invariant networks. However, augmenting 3D rotations (<span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span>) is impractical for the former, while the latter primarily focuses on the reduced problem of 2D rotation (<span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></mrow></math></span>) invariance. Existing methods targeting <span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span> rotation invariance suffer from limitations in discriminative capability. In this paper, we propose a novel approach (VNI-Net) based on the Vector Neurons Network (VNN) to achieve <span><math><mrow><mi>SO</mi><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow></math></span> rotation invariance. Our method begins by extracting rotation-equivariant features from neighboring points and projecting these low-dimensional features into a high-dimensional space using VNN. We then compute both Euclidean and cosine distances in the rotation-equivariant feature space to obtain rotation-invariant features. Finally, we aggregate these features using generalized-mean (GeM) pooling to generate the global descriptor. To mitigate the significant information loss associated with formulating rotation-invariant features, we propose computing distances between features at different layers within the Euclidean space neighborhood. This approach significantly enhances the discriminability of the descriptors while maintaining computational efficiency. We conduct experiments across multiple publicly available datasets captured with vehicle-mounted, drone-mounted LiDAR sensors and handheld. VNI-Net outperforms baseline methods by up to 15.3% in datasets with rotation, while achieving comparable results with state-of-the-art place recognition methods in datasets with less rotation. Our code is open-sourced at <span><span>https://github.com/tiev-tongji/VNI-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 506-517"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.isprsjprs.2024.09.030
Li Li , Qingqing Li , Guozheng Xu , Pengwei Zhou , Jingmin Tu , Jie Li , Mingming Li , Jian Yao
Roof plane segmentation from airborne light detection and ranging (LiDAR) point clouds is an important technology for three-dimensional (3D) building model reconstruction. One of the key issues of plane segmentation is how to design powerful features that can exactly distinguish adjacent planar patches. The quality of point feature directly determines the accuracy of roof plane segmentation. Most of existing approaches use handcrafted features, such as point-to-plane distance, normal vector, etc., to extract roof planes. However, the abilities of these features are relatively low, especially in boundary areas. To solve this problem, we propose a boundary-aware point clustering approach in Euclidean and embedding spaces constructed by a multi-task deep network for roof plane segmentation. We design a three-branch multi-task network to predict semantic labels, point offsets and extract deep embedding features. In the first branch, we classify the input data as non-roof, boundary and plane points. In the second branch, we predict point offsets for shifting each point towards its respective instance center. In the third branch, we constrain that points of the same plane instance should have the similar embeddings. We aim to ensure that points of the same plane instance are close as much as possible in both Euclidean and embedding spaces. However, although deep network has strong feature representative ability, it is still hard to accurately distinguish points near the plane instance boundary. Therefore, we first robustly group plane points into many clusters in Euclidean and embedding spaces to find candidate planes. Then, we assign the rest boundary points to their closest clusters to generate the final complete roof planes. In this way, we can effectively reduce the influence of unreliable boundary points. In addition, to train the network and evaluate the performance of our approach, we prepare a synthetic dataset and two real datasets. The experiments conducted on synthetic and real datasets show that the proposed approach significantly outperforms the existing state-of-the-art approaches in both qualitative evaluation and quantitative metrics. To facilitate future research, we will make datasets and source code of our approach publicly available at https://github.com/Li-Li-Whu/DeepRoofPlane.
{"title":"A boundary-aware point clustering approach in Euclidean and embedding spaces for roof plane segmentation","authors":"Li Li , Qingqing Li , Guozheng Xu , Pengwei Zhou , Jingmin Tu , Jie Li , Mingming Li , Jian Yao","doi":"10.1016/j.isprsjprs.2024.09.030","DOIUrl":"10.1016/j.isprsjprs.2024.09.030","url":null,"abstract":"<div><div>Roof plane segmentation from airborne light detection and ranging (LiDAR) point clouds is an important technology for three-dimensional (3D) building model reconstruction. One of the key issues of plane segmentation is how to design powerful features that can exactly distinguish adjacent planar patches. The quality of point feature directly determines the accuracy of roof plane segmentation. Most of existing approaches use handcrafted features, such as point-to-plane distance, normal vector, etc., to extract roof planes. However, the abilities of these features are relatively low, especially in boundary areas. To solve this problem, we propose a boundary-aware point clustering approach in Euclidean and embedding spaces constructed by a multi-task deep network for roof plane segmentation. We design a three-branch multi-task network to predict semantic labels, point offsets and extract deep embedding features. In the first branch, we classify the input data as non-roof, boundary and plane points. In the second branch, we predict point offsets for shifting each point towards its respective instance center. In the third branch, we constrain that points of the same plane instance should have the similar embeddings. We aim to ensure that points of the same plane instance are close as much as possible in both Euclidean and embedding spaces. However, although deep network has strong feature representative ability, it is still hard to accurately distinguish points near the plane instance boundary. Therefore, we first robustly group plane points into many clusters in Euclidean and embedding spaces to find candidate planes. Then, we assign the rest boundary points to their closest clusters to generate the final complete roof planes. In this way, we can effectively reduce the influence of unreliable boundary points. In addition, to train the network and evaluate the performance of our approach, we prepare a synthetic dataset and two real datasets. The experiments conducted on synthetic and real datasets show that the proposed approach significantly outperforms the existing state-of-the-art approaches in both qualitative evaluation and quantitative metrics. To facilitate future research, we will make datasets and source code of our approach publicly available at <span><span>https://github.com/Li-Li-Whu/DeepRoofPlane</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 518-530"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.isprsjprs.2024.09.029
Jialu Li, Chen Wu
With the rapid advancement in remote sensing Earth observation technology, an abundance of Time Series multispectral remote sensing Images (TSIs) from platforms like Landsat and Sentinel-2 are now accessible, offering essential data support for Time Series remote sensing images Change Detection (TSCD). However, TSCD faces misalignment challenges due to variations in radiation incidence angles, satellite orbit deviations, and other factors when capturing TSIs at the same geographic location but different times. Furthermore, another important issue that needs immediate attention is the precise determination of change moments for change areas within TSIs. To tackle these challenges, this paper proposes Multi-RLD-Net, a multi-task network that efficiently utilizes difference features to explore change areas and corresponding change moments in TSIs. To the best of our knowledge, this is the first time that using deep learning for identifying change moments in TSIs. Multi-RLD-Net integrates Optical Flow with Long Short-Term Memory (LSTM) to derive differences between TSIs. Initially, a lightweight encoder is introduced to extract multi-scale spatial features, which maximally preserve original features through a siamese structure. Subsequently, shallow spatial features extracted by the encoder are input into the novel Recursive Optical Flow Difference (ROD) module to align input features and detect differences between them, while deep spatial features extracted by the encoder are input into LSTM to capture long-term temporal dependencies and differences between hidden states. Both branches output differences among TSIs, enhancing the expressive capacity of the model. Finally, the decoder identifies change areas and their corresponding change moments using multi-task branches. Experiments on UTRNet dataset and DynamicEarthNet dataset demonstrate that proposed RLD-Net and Multi-RLD-Net outperform representative approaches, achieving F1 value improvements of 1.29% and 10.42% compared to the state-of-the art method MC2ABNet. The source code will be available soon at https://github.com/lijialu144/Multi-RLD-Net.
随着遥感地球观测技术的快速发展,现在可以从大地遥感卫星和哨兵-2 等平台获取大量时间序列多光谱遥感图像(TSIs),为时间序列遥感图像变化探测(TSCD)提供重要的数据支持。然而,由于辐射入射角的变化、卫星轨道偏差和其他因素,在捕捉同一地理位置但不同时间的 TSIs 时,TSCD 面临着不对齐的挑战。此外,另一个亟需关注的重要问题是如何精确确定 TSI 中变化区域的变化时刻。为了应对这些挑战,本文提出了多任务网络 Multi-RLD-Net,它能有效利用差异特征来探索 TSI 中的变化区域和相应的变化时刻。据我们所知,这是首次利用深度学习识别 TSI 中的变化时刻。Multi-RLD-Net 将光流与长短时记忆(LSTM)相结合,以推导出 TSI 之间的差异。首先,引入轻量级编码器来提取多尺度空间特征,通过连体结构最大限度地保留原始特征。随后,编码器提取的浅层空间特征被输入到新颖的递归光流差分(ROD)模块中,以对齐输入特征并检测它们之间的差异;编码器提取的深层空间特征被输入到 LSTM 中,以捕捉隐藏状态之间的长期时间依赖性和差异。这两个分支都会输出 TSI 之间的差异,从而增强模型的表达能力。最后,解码器使用多任务分支识别变化区域及其相应的变化时刻。在 UTRNet 数据集和 DynamicEarthNet 数据集上的实验表明,所提出的 RLD-Net 和 Multi-RLD-Net 优于代表性方法,与最先进的方法 MC2ABNet 相比,F1 值分别提高了 1.29% 和 10.42%。源代码即将在 https://github.com/lijialu144/Multi-RLD-Net 上公布。
{"title":"Using difference features effectively: A multi-task network for exploring change areas and change moments in time series remote sensing images","authors":"Jialu Li, Chen Wu","doi":"10.1016/j.isprsjprs.2024.09.029","DOIUrl":"10.1016/j.isprsjprs.2024.09.029","url":null,"abstract":"<div><div>With the rapid advancement in remote sensing Earth observation technology, an abundance of Time Series multispectral remote sensing Images (TSIs) from platforms like Landsat and Sentinel-2 are now accessible, offering essential data support for Time Series remote sensing images Change Detection (TSCD). However, TSCD faces misalignment challenges due to variations in radiation incidence angles, satellite orbit deviations, and other factors when capturing TSIs at the same geographic location but different times. Furthermore, another important issue that needs immediate attention is the precise determination of change moments for change areas within TSIs. To tackle these challenges, this paper proposes Multi-RLD-Net, a multi-task network that efficiently utilizes difference features to explore change areas and corresponding change moments in TSIs. To the best of our knowledge, this is the first time that using deep learning for identifying change moments in TSIs. Multi-RLD-Net integrates Optical Flow with Long Short-Term Memory (LSTM) to derive differences between TSIs. Initially, a lightweight encoder is introduced to extract multi-scale spatial features, which maximally preserve original features through a siamese structure. Subsequently, shallow spatial features extracted by the encoder are input into the novel Recursive Optical Flow Difference (ROD) module to align input features and detect differences between them, while deep spatial features extracted by the encoder are input into LSTM to capture long-term temporal dependencies and differences between hidden states. Both branches output differences among TSIs, enhancing the expressive capacity of the model. Finally, the decoder identifies change areas and their corresponding change moments using multi-task branches. Experiments on UTRNet dataset and DynamicEarthNet dataset demonstrate that proposed RLD-Net and Multi-RLD-Net outperform representative approaches, achieving F1 value improvements of 1.29% and 10.42% compared to the state-of-the art method MC<sup>2</sup>ABNet. The source code will be available soon at <span><span>https://github.com/lijialu144/Multi-RLD-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 487-505"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1016/j.isprsjprs.2024.09.026
Zhaojun Chen , Huaiqing Zhang , Meng Zhang , Yehong Wu , Yang Liu
As an important shoreline vegetation and highly productive ecosystem, mangroves play an essential role in the protection of coastlines and ecological diversity. Accurate mapping of the spatial distribution of mangroves is crucial for the protection and restoration of mangrove ecosystems. Supervised classification methods rely on large sample sets and complex classifiers and traditional thresholding methods that require empirical thresholds, given the problems that limit the feasibility and stability of existing mangrove identification and mapping methods on large scales. Thus, this paper develops a novel mangrove index (spectral and SAR mangrove index, SSMI) and Gaussian mixture model (GMM) mangrove mapping method, which does not require training samples and can automatically and accurately map mangrove boundaries by utilizing only single-scene Sentinel-1 and single-scene Sentinel-2 images from the same time period. The SSMI capitalizes on the fact that mangroves are differentiated from other land cover types in terms of optical characteristics (greenness and moisture) and backscattering coefficients of SAR images and ultimately highlights mangrove forest information through the product of three expressions (f(S) = red egde/SWIR1, f(B) = 1/(1 + e-VH), f(W)=(NIR-SWIR1)/(NIR+SWIR1)). The proposed SSMI was tested in six typical mangrove distribution areas in China where climatic conditions and mangrove species vary widely. The results indicated that the SSMI was more capable of mapping mangrove forests than the other mangrove indices (CMRI, NDMI, MVI, and MI), with overall accuracys (OA) higher than 0.90 and F1 scores as high as 0.93 for the other five areas except for the Maowei Gulf (S5). Moreover, the mangrove maps generated by the SSMI were highly consistent with the reference maps (HGMF_2020、LASAC_2018 and IMMA). In addition, the SSMI achieves stable performance, as shown by the mapping results of the other two classification methods (K-means and Otsu’s algorithm). Mangrove mapping in six typical mangrove distribution areas in China for five consecutive years (2019–2023) and experiments in three Southeast Asian countries with major mangrove distributions (Thailand, Vietnam, and Indonesia) demonstrated that the SSMIs constructed in this paper are highly stable across time and space. The SSMI proposed in this paper does not require reference samples or predefined parameters; thus, it has great flexibility and applicability in mapping mangroves on a large scale, especially in cloudy areas.
{"title":"Mangrove mapping in China using Gaussian mixture model with a novel mangrove index (SSMI) derived from optical and SAR imagery","authors":"Zhaojun Chen , Huaiqing Zhang , Meng Zhang , Yehong Wu , Yang Liu","doi":"10.1016/j.isprsjprs.2024.09.026","DOIUrl":"10.1016/j.isprsjprs.2024.09.026","url":null,"abstract":"<div><div>As an important shoreline vegetation and highly productive ecosystem, mangroves play an essential role in the protection of coastlines and ecological diversity. Accurate mapping of the spatial distribution of mangroves is crucial for the protection and restoration of mangrove ecosystems. Supervised classification methods rely on large sample sets and complex classifiers and traditional thresholding methods that require empirical thresholds, given the problems that limit the feasibility and stability of existing mangrove identification and mapping methods on large scales. Thus, this paper develops a novel mangrove index (spectral and SAR mangrove index, SSMI) and Gaussian mixture model (GMM) mangrove mapping method, which does not require training samples and can automatically and accurately map mangrove boundaries by utilizing only single-scene Sentinel-1 and single-scene Sentinel-2 images from the same time period. The SSMI capitalizes on the fact that mangroves are differentiated from other land cover types in terms of optical characteristics (greenness and moisture) and backscattering coefficients of SAR images and ultimately highlights mangrove forest information through the product of three expressions (<em>f</em>(<em>S</em>) = red egde/SWIR1, <em>f</em>(<em>B</em>) = 1/(1 + e<sup>-VH</sup>), <em>f</em>(<em>W</em>)=(NIR-SWIR1)/(NIR+SWIR1)). The proposed SSMI was tested in six typical mangrove distribution areas in China where climatic conditions and mangrove species vary widely. The results indicated that the SSMI was more capable of mapping mangrove forests than the other mangrove indices (CMRI, NDMI, MVI, and MI), with overall accuracys (OA) higher than 0.90 and F1 scores as high as 0.93 for the other five areas except for the Maowei Gulf (S5). Moreover, the mangrove maps generated by the SSMI were highly consistent with the reference maps (HGMF_2020、LASAC_2018 and IMMA). In addition, the SSMI achieves stable performance, as shown by the mapping results of the other two classification methods (K-means and Otsu’s algorithm). Mangrove mapping in six typical mangrove distribution areas in China for five consecutive years (2019–2023) and experiments in three Southeast Asian countries with major mangrove distributions (Thailand, Vietnam, and Indonesia) demonstrated that the SSMIs constructed in this paper are highly stable across time and space. The SSMI proposed in this paper does not require reference samples or predefined parameters; thus, it has great flexibility and applicability in mapping mangroves on a large scale, especially in cloudy areas.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 466-486"},"PeriodicalIF":10.6,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}