While wetlands have been extensively studied using optical and radar satellite imagery, thermal imagery has been used less often due its low spatial – temporal resolutions and challenges for emissivity estimation. Since 2018, spaceborne thermal imagery has gained interest due to the availability of ECOSTRESS data, which are acquired at 70 m spatial resolution and a 3–5 revisit time. This study aimed at comparing the contribution of ECOSTRESS time-series to wetland mapping to that of other thermal time-series (i.e., Landsat-TIRS, ASTER-TIR), Sentinel-1 SAR and Sentinel-2 optical satellite time-series, and topographical variables derived from satellite data. The study was applied to a 209 km2 heathland site in north-western France that includes riverine, slope, and flat wetlands. The method used consisted of four steps: (i) four-year time-series (2019–2022) were aggregated into dense annual time-series; (ii) the temporal dimension was reduced using functional principal component analysis (FPCA); (iii) the most discriminating components of the FPCA were selected based on recursive feature elimination; and (iv) the contribution of each sensor time-series to wetland mapping was assessed based on the accuracy of a random forest model trained and tested using reference field data. The results indicated that an ECOSTRESS time-series that combined day and night acquisitions was more accurate (overall F1-score: 0.71) than Landsat-TIRS and ASTER-TIR time-series (overall F1-score: 0.40–0.62). A combination of ECOSTRESS thermal images, Sentinel-2 optical images, Sentinel-1 SAR images, and topographical variables outperformed the sensor-specific accuracies (overall F1-score: 0.87), highlighting the synergy of thermal, optical, and topographical data for wetland mapping.
{"title":"Contribution of ECOSTRESS thermal imagery to wetland mapping: Application to heathland ecosystems","authors":"Liam Loizeau-Woollgar , Sébastien Rapinel , Julien Pellen , Bernard Clément , Laurence Hubert-Moy","doi":"10.1016/j.isprsjprs.2025.01.014","DOIUrl":"10.1016/j.isprsjprs.2025.01.014","url":null,"abstract":"<div><div>While wetlands have been extensively studied using optical and radar satellite imagery, thermal imagery has been used less often due its low spatial – temporal resolutions and challenges for emissivity estimation. Since 2018, spaceborne thermal imagery has gained interest due to the availability of ECOSTRESS data, which are acquired at 70 m spatial resolution and a 3–5 revisit time. This study aimed at comparing the contribution of ECOSTRESS time-series to wetland mapping to that of other thermal time-series (i.e., Landsat-TIRS, ASTER-TIR), Sentinel-1 SAR and Sentinel-2 optical satellite time-series, and topographical variables derived from satellite data. The study was applied to a 209 km<sup>2</sup> heathland site in north-western France that includes riverine, slope, and flat wetlands. The method used consisted of four steps: (i) four-year time-series (2019–2022) were aggregated into dense annual time-series; (ii) the temporal dimension was reduced using functional principal component analysis (FPCA); (iii) the most discriminating components of the FPCA were selected based on recursive feature elimination; and (iv) the contribution of each sensor time-series to wetland mapping was assessed based on the accuracy of a random forest model trained and tested using reference field data. The results indicated that an ECOSTRESS time-series that combined day and night acquisitions was more accurate (overall F1-score: 0.71) than Landsat-TIRS and ASTER-TIR time-series (overall F1-score: 0.40–0.62). A combination of ECOSTRESS thermal images, Sentinel-2 optical images, Sentinel-1 SAR images, and topographical variables outperformed the sensor-specific accuracies (overall F1-score: 0.87), highlighting the synergy of thermal, optical, and topographical data for wetland mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 649-660"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2025.01.006
Valérie Zermatten , Javiera Castillo-Navarro , Diego Marcos , Devis Tuia
Why should we confine land cover classes to rigid and arbitrary definitions? Land cover mapping is a central task in remote sensing image processing, but the rigorous class definitions can sometimes restrict the transferability of annotations between datasets. Open vocabulary recognition, i.e. using natural language to define a specific object or pattern in an image, breaks free from predefined nomenclature and offers flexible recognition of diverse categories with a more general image understanding across datasets and labels. The open vocabulary framework opens doors to search for concepts of interest, beyond individual class boundaries. In this work, we propose to use Text As supervision for COntrastive Semantic Segmentation (TACOSS), and we design an open vocabulary semantic segmentation model that extends its capacities beyond that of a traditional model for land cover mapping: In addition to visual pattern recognition, TACOSS leverages the common sense knowledge captured by language models and is capable of interpreting the image at the pixel level, attributing semantics to each pixel and removing the constraints of a fixed set of land cover labels. By learning to match visual representations with text embeddings, TACOSS can transition smoothly from one set of labels to another and enables the interaction with remote sensing images in natural language. Our approach combines a pretrained text encoder with a visual encoder and adopts supervised contrastive learning to align the visual and textual modalities. We explore several text encoders and label representation methods and compare their abilities to encode transferable land cover semantics. The model’s capacity to predict a set of different land cover labels on an unseen dataset is also explored to illustrate the generalization capacities across domains of our approach. Overall, TACOSS is a general method and permits adapting between different sets of land cover labels with minimal computational overhead. Code is publicly available online1.
{"title":"Learning transferable land cover semantics for open vocabulary interactions with remote sensing images","authors":"Valérie Zermatten , Javiera Castillo-Navarro , Diego Marcos , Devis Tuia","doi":"10.1016/j.isprsjprs.2025.01.006","DOIUrl":"10.1016/j.isprsjprs.2025.01.006","url":null,"abstract":"<div><div>Why should we confine land cover classes to rigid and arbitrary definitions? Land cover mapping is a central task in remote sensing image processing, but the rigorous class definitions can sometimes restrict the transferability of annotations between datasets. Open vocabulary recognition, i.e. using natural language to define a specific object or pattern in an image, breaks free from predefined nomenclature and offers flexible recognition of diverse categories with a more general image understanding across datasets and labels. The open vocabulary framework opens doors to search for concepts of interest, beyond individual class boundaries. In this work, we propose to use Text As supervision for COntrastive Semantic Segmentation (TACOSS), and we design an open vocabulary semantic segmentation model that extends its capacities beyond that of a traditional model for land cover mapping: In addition to visual pattern recognition, TACOSS leverages the common sense knowledge captured by language models and is capable of interpreting the image at the pixel level, attributing semantics to each pixel and removing the constraints of a fixed set of land cover labels. By learning to match visual representations with text embeddings, TACOSS can transition smoothly from one set of labels to another and enables the interaction with remote sensing images in natural language. Our approach combines a pretrained text encoder with a visual encoder and adopts supervised contrastive learning to align the visual and textual modalities. We explore several text encoders and label representation methods and compare their abilities to encode transferable land cover semantics. The model’s capacity to predict a set of different land cover labels on an unseen dataset is also explored to illustrate the generalization capacities across domains of our approach. Overall, TACOSS is a general method and permits adapting between different sets of land cover labels with minimal computational overhead. Code is publicly available online<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 621-636"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.005
Li Fang , Tianyu Li , Yanghong Lin , Shudong Zhou , Wei Yao
Point clouds, which are a fundamental type of 3D data, play an essential role in various applications like 3D reconstruction, autonomous driving, and robotics. However, point clouds generated via measuring the time-of-flight of emitted and backscattered laser pulses of TLS, frequently include false points caused by mirror-like reflective surfaces, resulting in degradation of data quality and fidelity. This study introduces an algorithm to eliminate reflection noise from TLS scan data. Our novel algorithm detects reflection planes by utilizing both geometric and physical characteristics to recognize reflection points according to optical reflection theory. Radiometric correction is applied to the raw laser intensity, after which reflective planes are extracted using a threshold. In the virtual points identification phase, these points are detected along the light propagation path, grounded on the specular reflection principle. Moreover, an improved feature descriptor, known as RE-LFSH, is employed to assess the similarity between two points in terms of reflection symmetry. We have adapted the LFSH feature descriptor to retain reflection features, mitigating interference from symmetrical architectural structures. Incorporating the Hausdorff feature distance into the algorithm fortifies its resistance to ghosting and deformations, thereby boosting the accuracy of virtual point detection. Additionally, to overcome the shortage of annotated datasets, a novel benchmark dataset named 3DRN, specifically designed for this task, is introduced. Extensive experiments on the 3DRN benchmark dataset, featuring diverse urban environments with virtual TLS reflection noise, show our algorithm improves precision and recall rates for 3D points in reflective areas by 57.03% and 31.80%, respectively. Our approach improves outlier detection by 9.17% and enhances accuracy by 5.65% compared to leading methods. You can access the 3DRN dataset at https://github.com/Tsuiky/3DRN.
{"title":"A coupled optical–radiometric modeling approach to removing reflection noise in TLS data of urban areas","authors":"Li Fang , Tianyu Li , Yanghong Lin , Shudong Zhou , Wei Yao","doi":"10.1016/j.isprsjprs.2024.12.005","DOIUrl":"10.1016/j.isprsjprs.2024.12.005","url":null,"abstract":"<div><div>Point clouds, which are a fundamental type of 3D data, play an essential role in various applications like 3D reconstruction, autonomous driving, and robotics. However, point clouds generated via measuring the time-of-flight of emitted and backscattered laser pulses of TLS, frequently include false points caused by mirror-like reflective surfaces, resulting in degradation of data quality and fidelity. This study introduces an algorithm to eliminate reflection noise from TLS scan data. Our novel algorithm detects reflection planes by utilizing both geometric and physical characteristics to recognize reflection points according to optical reflection theory. Radiometric correction is applied to the raw laser intensity, after which reflective planes are extracted using a threshold. In the virtual points identification phase, these points are detected along the light propagation path, grounded on the specular reflection principle. Moreover, an improved feature descriptor, known as RE-LFSH, is employed to assess the similarity between two points in terms of reflection symmetry. We have adapted the LFSH feature descriptor to retain reflection features, mitigating interference from symmetrical architectural structures. Incorporating the Hausdorff feature distance into the algorithm fortifies its resistance to ghosting and deformations, thereby boosting the accuracy of virtual point detection. Additionally, to overcome the shortage of annotated datasets, a novel benchmark dataset named 3DRN, specifically designed for this task, is introduced. Extensive experiments on the 3DRN benchmark dataset, featuring diverse urban environments with virtual TLS reflection noise, show our algorithm improves precision and recall rates for 3D points in reflective areas by 57.03% and 31.80%, respectively. Our approach improves outlier detection by 9.17% and enhances accuracy by 5.65% compared to leading methods. You can access the 3DRN dataset at <span><span>https://github.com/Tsuiky/3DRN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 217-231"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.004
Qunming Wang , Wenjing Ma , Sicong Liu , Xiaohua Tong , Peter M. Atkinson
The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) is a Mars-dedicated compact reconnaissance imaging spectrometer that captures remote sensing data with very fine spectral resolution. However, the spatial resolution of CRISM data is relatively coarse (18 m), limiting its application to regional scales. The Context Camera (CTX) is a digital camera equipped with a wide-angle lens, providing a finer spatial resolution (6 m) and larger field-of-view, but CTX provides only a single panchromatic band. To produce CRISM hyperspectral data with finer spatial resolution (e.g., 6 m of CTX images), this research investigated spatial-spectral fusion of 18 m CRISM images with 6 m CTX panchromatic images. In spatial-spectral fusion, to address the long-standing issue of incomplete data fidelity to the original hyperspectral data in existing methods, a new paradigm called Data Fidelity-oriented Spatial-Spectral Fusion (DF-SSF) was proposed. The effectiveness of DF-SSF was validated through experiments on data from six areas on Mars. The results indicate that the fusion of CRISM and CTX can increase the spatial resolution of CRISM hyperspectral data effectively. Moreover, DF-SSF can increase the fusion accuracy noticeably while maintaining perfect data fidelity to the original hyperspectral data. In addition, DF-SSF is theoretically applicable to any existing spatial-spectral fusion methods. The 6 m CRISM hyperspectral data inherit the advantages of the original 18 m data in spectral resolution, and provide richer spatial texture information on the Martian surface, with broad application potential.
紧凑型火星侦察成像光谱仪(CRISM)是一种火星专用的紧凑型侦察成像光谱仪,能够以非常精细的光谱分辨率捕获遥感数据。然而,CRISM数据的空间分辨率较粗(18 m),限制了其在区域尺度上的应用。背景相机(CTX)是一种配备广角镜头的数码相机,提供更精细的空间分辨率(6米)和更大的视野,但CTX只提供单一的全色波段。为了获得更精细的空间分辨率(如6 m CTX图像)的CRISM高光谱数据,本研究对18 m CRISM图像与6 m CTX全色图像进行了空间光谱融合。在空间-光谱融合中,针对现有方法对原始高光谱数据保真度不高的问题,提出了一种面向数据保真度的空间-光谱融合(DF-SSF)新范式。通过对火星六个区域的数据进行实验,验证了DF-SSF的有效性。结果表明,CRISM与CTX的融合可以有效提高CRISM高光谱数据的空间分辨率。此外,DF-SSF在保持对原始高光谱数据的良好保真度的同时,可以显著提高融合精度。此外,DF-SSF在理论上适用于任何现有的空间光谱融合方法。6 m CRISM高光谱数据继承了原18 m数据在光谱分辨率上的优势,提供了更丰富的火星表面空间纹理信息,具有广阔的应用潜力。
{"title":"Data fidelity-oriented spatial-spectral fusion of CRISM and CTX images","authors":"Qunming Wang , Wenjing Ma , Sicong Liu , Xiaohua Tong , Peter M. Atkinson","doi":"10.1016/j.isprsjprs.2024.12.004","DOIUrl":"10.1016/j.isprsjprs.2024.12.004","url":null,"abstract":"<div><div>The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) is a Mars-dedicated compact reconnaissance imaging spectrometer that captures remote sensing data with very fine spectral resolution. However, the spatial resolution of CRISM data is relatively coarse (18 m), limiting its application to regional scales. The Context Camera (CTX) is a digital camera equipped with a wide-angle lens, providing a finer spatial resolution (6 m) and larger field-of-view, but CTX provides only a single panchromatic band. To produce CRISM hyperspectral data with finer spatial resolution (e.g., 6 m of CTX images), this research investigated spatial-spectral fusion of 18 m CRISM images with 6 m CTX panchromatic images. In spatial-spectral fusion, to address the long-standing issue of incomplete data fidelity to the original hyperspectral data in existing methods, a new paradigm called Data Fidelity-oriented Spatial-Spectral Fusion (DF-SSF) was proposed. The effectiveness of DF-SSF was validated through experiments on data from six areas on Mars. The results indicate that the fusion of CRISM and CTX can increase the spatial resolution of CRISM hyperspectral data effectively. Moreover, DF-SSF can increase the fusion accuracy noticeably while maintaining perfect data fidelity to the original hyperspectral data. In addition, DF-SSF is theoretically applicable to any existing spatial-spectral fusion methods. The 6 m CRISM hyperspectral data inherit the advantages of the original 18 m data in spectral resolution, and provide richer spatial texture information on the Martian surface, with broad application potential.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 172-191"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RS scene graphs represent RS scenes as graphs with objects as nodes and their spatial relationships as edges, playing a crucial role in understanding and interpreting RS scenes at a higher level. However, existing RS scene graph generation methods, relying on deep learning models, face limitations due to their dependence on extensive relationship labels, restricted generation accuracy, and limited generalizability. To address these challenges, we proposed a spatial relationship computing model based on prior geographic information knowledge for RS scene graph generation. We refer to the RS scene graph generated using our method as SG-SSR for short. Furthermore, we investigated the application of SG-SSR in RS scene retrieval, demonstrating improved retrieval accuracy for spatial relationships between entities. The experiments show that our scene graph generation method does not rely on relationship labels, and has higher generation accuracy and greater universality. Moreover, the retrieval method based on SG-SSR outperformed other retrieval methods based on image feature vectors, with a retrieval accuracy index 0.098 higher than the alternatives(RemoteCLIP(mask)). The dataset and code are available at https://gitee.com/tangjiayitangjiayi/sg-ssr.
{"title":"Remote sensing scene graph generation for improved retrieval based on spatial relationships","authors":"Jiayi Tang, Xiaochong Tong, Chunping Qiu, Yuekun Sun, Haoshuai Song, Yaxian Lei, Yi Lei, Congzhou Guo","doi":"10.1016/j.isprsjprs.2025.01.012","DOIUrl":"10.1016/j.isprsjprs.2025.01.012","url":null,"abstract":"<div><div>RS scene graphs represent RS scenes as graphs with objects as nodes and their spatial relationships as edges, playing a crucial role in understanding and interpreting RS scenes at a higher level. However, existing RS scene graph generation methods, relying on deep learning models, face limitations due to their dependence on extensive relationship labels, restricted generation accuracy, and limited generalizability. To address these challenges, we proposed a spatial relationship computing model based on prior geographic information knowledge for RS scene graph generation. We refer to the RS scene graph generated using our method as SG-SSR for short. Furthermore, we investigated the application of SG-SSR in RS scene retrieval, demonstrating improved retrieval accuracy for spatial relationships between entities. The experiments show that our scene graph generation method does not rely on relationship labels, and has higher generation accuracy and greater universality. Moreover, the retrieval method based on SG-SSR outperformed other retrieval methods based on image feature vectors, with a retrieval accuracy index 0.098 higher than the alternatives(RemoteCLIP(mask)). The dataset and code are available at <span><span>https://gitee.com/tangjiayitangjiayi/sg-ssr</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 741-752"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.011
Nengcai Li , Deliang Xiang , Xiaokun Sun , Canbin Hu , Yi Su
Superpixel generation is an essential preprocessing step for intelligent interpretation of object-level Polarimetric Synthetic Aperture Radar (PolSAR) images. The Simple Linear Iterative Clustering (SLIC) algorithm has become one of the primary methods for superpixel generation in PolSAR images due to its advantages of minimal human intervention and ease of implementation. However, existing SLIC-based superpixel generation methods for PolSAR images often use distance measures based on the complex Wishart distribution as the similarity metric. These methods are not ideal for segmenting heterogeneous regions, and a single superpixel generation result cannot simultaneously extract coarse and fine levels of detail in the image. To address this, this paper proposes a multiscale adaptive superpixel generation method for PolSAR images based on SLIC. To tackle the issue of the complex Wishart distribution’s inaccuracy in modeling urban heterogeneous regions, this paper employs the polarimetric target decomposition method. It extracts the polarimetric scattering features of the land cover, then constructs a similarity measure for these features using Riemannian metric. To achieve multiscale superpixel segmentation in a single superpixel segmentation process, this paper introduces a new method for initializing cluster centers based on polarimetric homogeneity measure. This initialization method assigns denser cluster centers in heterogeneous areas and automatically adjusts the size of the search regions according to the polarimetric homogeneity measure. Finally, a novel clustering distance metric is defined, integrating multiple types of information, including polarimetric scattering feature similarity, power feature similarity, and spatial similarity. This metric uses the polarimetric homogeneity measure to adaptively balance the relative weights between the various similarities. Comparative experiments were conducted using three real PolSAR datasets with state-of-the-art SLIC-based methods (Qin-RW and Yin-HLT). The results demonstrate that the proposed method provides richer multiscale detail information and significantly improves segmentation outcomes. For example, with the AIRSAR dataset and the step size of 42, the proposed method achieves improvements of 16.56 in BR and 12.01 in ASA compared to the Qin-RW method. Source code of the proposed method is made available at https://github.com/linengcai/PolSAR_MS_ASLIC.git.
超像素生成是目标级偏振合成孔径雷达(PolSAR)图像智能解译的重要预处理步骤。简单线性迭代聚类(Simple Linear Iterative Clustering, SLIC)算法以其人为干预最小、易于实现等优点成为PolSAR图像超像素生成的主要方法之一。然而,现有的基于slic的PolSAR图像超像素生成方法通常使用基于复Wishart分布的距离度量作为相似性度量。这些方法对于非均匀区域的分割并不理想,并且单个超像素生成结果不能同时提取图像中的粗、细细节。针对这一问题,提出了一种基于SLIC的PolSAR图像多尺度自适应超像素生成方法。为解决复杂Wishart分布在城市异质区域建模中的不准确性问题,本文采用了极化目标分解方法。提取地表覆盖物的极化散射特征,利用黎曼度量构造相似度度量。为了在单个超像素分割过程中实现多尺度超像素分割,提出了一种基于偏振均匀性测度的聚类中心初始化方法。该初始化方法在非均匀区域中分配密度较大的聚类中心,并根据极化均匀性度量自动调整搜索区域的大小。最后,结合极化散射特征相似度、功率特征相似度和空间相似度等多种信息,定义了一种新的聚类距离度量。该度量使用极化均匀性度量自适应地平衡各种相似性之间的相对权重。对比实验使用三个真实的PolSAR数据集,采用最先进的基于slic的方法(Qin-RW和Yin-HLT)。结果表明,该方法提供了更丰富的多尺度细节信息,显著提高了分割效果。以AIRSAR数据集为例,在步长为42的情况下,与Qin-RW方法相比,该方法在BR和ASA方面分别提高了16.56%和12.01%。建议的方法的源代码可在https://github.com/linengcai/PolSAR_MS_ASLIC.git上获得。
{"title":"Multiscale adaptive PolSAR image superpixel generation based on local iterative clustering and polarimetric scattering features","authors":"Nengcai Li , Deliang Xiang , Xiaokun Sun , Canbin Hu , Yi Su","doi":"10.1016/j.isprsjprs.2024.12.011","DOIUrl":"10.1016/j.isprsjprs.2024.12.011","url":null,"abstract":"<div><div>Superpixel generation is an essential preprocessing step for intelligent interpretation of object-level Polarimetric Synthetic Aperture Radar (PolSAR) images. The Simple Linear Iterative Clustering (SLIC) algorithm has become one of the primary methods for superpixel generation in PolSAR images due to its advantages of minimal human intervention and ease of implementation. However, existing SLIC-based superpixel generation methods for PolSAR images often use distance measures based on the complex Wishart distribution as the similarity metric. These methods are not ideal for segmenting heterogeneous regions, and a single superpixel generation result cannot simultaneously extract coarse and fine levels of detail in the image. To address this, this paper proposes a multiscale adaptive superpixel generation method for PolSAR images based on SLIC. To tackle the issue of the complex Wishart distribution’s inaccuracy in modeling urban heterogeneous regions, this paper employs the polarimetric target decomposition method. It extracts the polarimetric scattering features of the land cover, then constructs a similarity measure for these features using Riemannian metric. To achieve multiscale superpixel segmentation in a single superpixel segmentation process, this paper introduces a new method for initializing cluster centers based on polarimetric homogeneity measure. This initialization method assigns denser cluster centers in heterogeneous areas and automatically adjusts the size of the search regions according to the polarimetric homogeneity measure. Finally, a novel clustering distance metric is defined, integrating multiple types of information, including polarimetric scattering feature similarity, power feature similarity, and spatial similarity. This metric uses the polarimetric homogeneity measure to adaptively balance the relative weights between the various similarities. Comparative experiments were conducted using three real PolSAR datasets with state-of-the-art SLIC-based methods (Qin-RW and Yin-HLT). The results demonstrate that the proposed method provides richer multiscale detail information and significantly improves segmentation outcomes. For example, with the AIRSAR dataset and the step size of 42, the proposed method achieves improvements of 16.56<span><math><mtext>%</mtext></math></span> in BR and 12.01<span><math><mtext>%</mtext></math></span> in ASA compared to the Qin-RW method. Source code of the proposed method is made available at <span><span>https://github.com/linengcai/PolSAR_MS_ASLIC.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 307-322"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.012
Chenhui Shi , Fulin Tang , Yihong Wu , Hongtu Ji , Hongjie Duan
Surface reconstruction in street scenes is a critical task in computer vision and photogrammetry, with images and LiDAR point clouds being commonly used data sources. However, image-only reconstruction faces challenges such as lighting variations, weak textures, and sparse viewpoints, while LiDAR-only methods suffer from issues like sparse and noisy LiDAR point clouds. Effectively integrating these two modalities to leverage their complementary strengths remains an open problem. Inspired by recent advances in neural implicit representations, we propose a novel street-level neural implicit surface reconstruction approach that incorporates images and LiDAR point clouds into a unified framework for joint optimization. Three key components make our approach achieve state-of-the-art (SOTA) reconstruction performance with high accuracy and completeness in street scenes. First, we introduce an adaptive photometric constraint weighting method to mitigate the impacts of lighting variations and weak textures on reconstruction. Second, a new B-spline-based hierarchical hash encoder is proposed to ensure the continuity of gradient-derived normals and further to reduce the noise from images and LiDAR point clouds. Third, we implement effective signed distance field (SDF) constraints in a spatial hash grid allocated in near-surface space to fully exploit the geometric information provided by LiDAR point clouds. Additionally, we present two street-level datasets—one virtual and one real-world—offering a comprehensive set of resources that existing public datasets lack. Experimental results demonstrate the superior performance of our method. Compared to the SOTA image-LiDAR combined neural implicit method, namely StreetSurf, ours significantly improves the F-score by approximately 7 percentage points. Our code and data are available at https://github.com/SCH1001/StreetRecon.
{"title":"Accurate and complete neural implicit surface reconstruction in street scenes using images and LiDAR point clouds","authors":"Chenhui Shi , Fulin Tang , Yihong Wu , Hongtu Ji , Hongjie Duan","doi":"10.1016/j.isprsjprs.2024.12.012","DOIUrl":"10.1016/j.isprsjprs.2024.12.012","url":null,"abstract":"<div><div>Surface reconstruction in street scenes is a critical task in computer vision and photogrammetry, with images and LiDAR point clouds being commonly used data sources. However, image-only reconstruction faces challenges such as lighting variations, weak textures, and sparse viewpoints, while LiDAR-only methods suffer from issues like sparse and noisy LiDAR point clouds. Effectively integrating these two modalities to leverage their complementary strengths remains an open problem. Inspired by recent advances in neural implicit representations, we propose a novel street-level neural implicit surface reconstruction approach that incorporates images and LiDAR point clouds into a unified framework for joint optimization. Three key components make our approach achieve state-of-the-art (SOTA) reconstruction performance with high accuracy and completeness in street scenes. First, we introduce an adaptive photometric constraint weighting method to mitigate the impacts of lighting variations and weak textures on reconstruction. Second, a new B-spline-based hierarchical hash encoder is proposed to ensure the continuity of gradient-derived normals and further to reduce the noise from images and LiDAR point clouds. Third, we implement effective signed distance field (SDF) constraints in a spatial hash grid allocated in near-surface space to fully exploit the geometric information provided by LiDAR point clouds. Additionally, we present two street-level datasets—one virtual and one real-world—offering a comprehensive set of resources that existing public datasets lack. Experimental results demonstrate the superior performance of our method. Compared to the SOTA image-LiDAR combined neural implicit method, namely StreetSurf, ours significantly improves the F-score by approximately 7 percentage points. Our code and data are available at <span><span>https://github.com/SCH1001/StreetRecon</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 295-306"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.007
Philipp Sibler , Francescopaolo Sica , Michael Schmitt
Simulated remote sensing images bear great potential for many applications in the field of Earth observation. They can be used as controlled testbed for the development of signal and image processing algorithms or can provide a means to get an impression of the potential of new sensor concepts. With the rise of deep learning, the synthesis of artificial remote sensing images by means of deep neural networks has become a hot research topic. While the generation of optical data is relatively straightforward, as it can rely on the use of established models from the computer vision community, the generation of synthetic aperture radar (SAR) data until now is still largely restricted to intensity images since the processing of complex-valued numbers by conventional neural networks poses significant challenges. With this work, we propose to circumvent these challenges by decomposing SAR interferograms into real-valued components. These components are then simultaneously synthesized by different branches of a multi-branch encoder–decoder network architecture. In the end, these real-valued components can be combined again into the final, complex-valued interferogram. Moreover, the effect of speckle and interferometric phase noise is replicated and applied to the synthesized interferometric data. Experimental results on both medium-resolution C-band repeat-pass SAR data and high-resolution X-band single-pass SAR data, demonstrate the general feasibility of the approach.
{"title":"Synthesis of complex-valued InSAR data with a multi-task convolutional neural network","authors":"Philipp Sibler , Francescopaolo Sica , Michael Schmitt","doi":"10.1016/j.isprsjprs.2024.12.007","DOIUrl":"10.1016/j.isprsjprs.2024.12.007","url":null,"abstract":"<div><div>Simulated remote sensing images bear great potential for many applications in the field of Earth observation. They can be used as controlled testbed for the development of signal and image processing algorithms or can provide a means to get an impression of the potential of new sensor concepts. With the rise of deep learning, the synthesis of artificial remote sensing images by means of deep neural networks has become a hot research topic. While the generation of optical data is relatively straightforward, as it can rely on the use of established models from the computer vision community, the generation of synthetic aperture radar (SAR) data until now is still largely restricted to intensity images since the processing of complex-valued numbers by conventional neural networks poses significant challenges. With this work, we propose to circumvent these challenges by decomposing SAR interferograms into real-valued components. These components are then simultaneously synthesized by different branches of a multi-branch encoder–decoder network architecture. In the end, these real-valued components can be combined again into the final, complex-valued interferogram. Moreover, the effect of speckle and interferometric phase noise is replicated and applied to the synthesized interferometric data. Experimental results on both medium-resolution C-band repeat-pass SAR data and high-resolution X-band single-pass SAR data, demonstrate the general feasibility of the approach.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 192-206"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.isprsjprs.2024.12.021
Ji Ge , Hong Zhang , Lijun Zuo , Lu Xu , Jingling Jiang , Mingyang Song , Yinhaibin Ding , Yazhe Xie , Fan Wu , Chao Wang , Wenjiang Huang
Timely and accurate mapping of rice cultivation distribution is crucial for ensuring global food security and achieving SDG2. From a global perspective, rice areas display high heterogeneity in spatial pattern and SAR time-series characteristics, posing substantial challenges to deep learning (DL) models’ performance, efficiency, and transferability. Moreover, due to their “black box” nature, DL often lack interpretability and credibility. To address these challenges, this paper constructs the first SAR rice dataset with spatiotemporal heterogeneity and proposes an explainable, lightweight model for rice area extraction, the eXplainable Mamba UNet (XM-UNet). The dataset is based on the 2023 multi-temporal Sentinel-1 data, covering diverse rice samples from the United States, Kenya, and Vietnam. A Temporal Feature Importance Explainer (TFI-Explainer) based on the Selective State Space Model is designed to enhance adaptability to the temporal heterogeneity of rice and the model’s interpretability. This explainer, coupled with the DL model, provides interpretations of the importance of SAR temporal features and facilitates crucial time phase screening. To overcome the spatial heterogeneity of rice, an Attention Sandglass Layer (ASL) combining CNN and self-attention mechanisms is designed to enhance the local spatial feature extraction capabilities. Additionally, the Parallel Visual State Space Layer (PVSSL) utilizes 2D-Selective-Scan (SS2D) cross-scanning to capture the global spatial features of rice multi-directionally, significantly reducing computational complexity through parallelization. Experimental results demonstrate that the XM-UNet adapts well to the spatiotemporal heterogeneity of rice globally, with OA and F1-score of 94.26 % and 90.73 %, respectively. The model is extremely lightweight, with only 0.190 M parameters and 0.279 GFLOPs. Mamba’s selective scanning facilitates feature screening, and its integration with CNN effectively balances rice’s local and global spatial characteristics. The interpretability experiments prove that the explanations of the importance of the temporal features provided by the model are crucial for guiding rice distribution mapping and filling a gap in the related field. The code is available in https://github.com/SAR-RICE/XM-UNet.
{"title":"Large-scale rice mapping under spatiotemporal heterogeneity using multi-temporal SAR images and explainable deep learning","authors":"Ji Ge , Hong Zhang , Lijun Zuo , Lu Xu , Jingling Jiang , Mingyang Song , Yinhaibin Ding , Yazhe Xie , Fan Wu , Chao Wang , Wenjiang Huang","doi":"10.1016/j.isprsjprs.2024.12.021","DOIUrl":"10.1016/j.isprsjprs.2024.12.021","url":null,"abstract":"<div><div>Timely and accurate mapping of rice cultivation distribution is crucial for ensuring global food security and achieving SDG2. From a global perspective, rice areas display high heterogeneity in spatial pattern and SAR time-series characteristics, posing substantial challenges to deep learning (DL) models’ performance, efficiency, and transferability. Moreover, due to their “black box” nature, DL often lack interpretability and credibility. To address these challenges, this paper constructs the first SAR rice dataset with spatiotemporal heterogeneity and proposes an explainable, lightweight model for rice area extraction, the eXplainable Mamba UNet (XM-UNet). The dataset is based on the 2023 multi-temporal Sentinel-1 data, covering diverse rice samples from the United States, Kenya, and Vietnam. A Temporal Feature Importance Explainer (TFI-Explainer) based on the Selective State Space Model is designed to enhance adaptability to the temporal heterogeneity of rice and the model’s interpretability. This explainer, coupled with the DL model, provides interpretations of the importance of SAR temporal features and facilitates crucial time phase screening. To overcome the spatial heterogeneity of rice, an Attention Sandglass Layer (ASL) combining CNN and self-attention mechanisms is designed to enhance the local spatial feature extraction capabilities. Additionally, the Parallel Visual State Space Layer (PVSSL) utilizes 2D-Selective-Scan (SS2D) cross-scanning to capture the global spatial features of rice multi-directionally, significantly reducing computational complexity through parallelization. Experimental results demonstrate that the XM-UNet adapts well to the spatiotemporal heterogeneity of rice globally, with OA and F1-score of 94.26 % and 90.73 %, respectively. The model is extremely lightweight, with only 0.190 M parameters and 0.279 GFLOPs. Mamba’s selective scanning facilitates feature screening, and its integration with CNN effectively balances rice’s local and global spatial characteristics. The interpretability experiments prove that the explanations of the importance of the temporal features provided by the model are crucial for guiding rice distribution mapping and filling a gap in the related field. The code is available in <span><span>https://github.com/SAR-RICE/XM-UNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 395-412"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstructing urban areas in 3D from satellite raster images has been a long-standing problem for both academical and industrial research. While automatic methods achieving this objective at a Level Of Detail (LOD) 1 are mostly efficient today, producing LOD2 models is still a scientific challenge. In particular, the quality and resolution of satellite data is too low to infer accurately the planar roof sections in 3D by using traditional plane detection algorithms. Existing methods rely upon the exploitation of both strong urban priors that reduce their applicability to a variety of environments and multi-modal data, including some derived 3D products such as Digital Surface Models. In this work, we address this issue with KIBS (Keypoints Inference By Segmentation), a method that detects planar roof sections in 3D from a single-view satellite image. By exploiting large-scale LOD2 databases produced by human operators with efficient neural architectures, we manage to both segment roof sections in images and extract keypoints enclosing these sections in 3D to form 3D-polygons with a low-complexity. The output set of 3D-polygons can be used to reconstruct LOD2 models of buildings when combined with a plane assembly method. While conceptually simple, our method manages to capture roof sections as 3D-polygons with a good accuracy, from a single satellite image only by learning indirect 3D information contained in the image, in particular from the view inclination, the distortion of facades, the building shadows, roof peak and ridge perspective. We demonstrate the potential of KIBS by reconstructing large urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of approximately 80%, and an altimetric error of the reconstructed LOD2 model of less than to 2 meters.
从卫星光栅图像中重建城市区域的三维图像一直是学术界和工业界研究的一个长期问题。虽然在细节级别(LOD) 1上实现这一目标的自动方法目前大多是有效的,但生成LOD2模型仍然是一个科学挑战。特别是卫星数据的质量和分辨率都很低,无法通过传统的平面检测算法准确地推断出三维平面屋顶剖面。现有的方法依赖于利用强大的城市先验,这降低了它们对各种环境和多模态数据的适用性,包括一些衍生的3D产品,如数字表面模型。在这项工作中,我们用KIBS (Keypoints Inference By Segmentation)解决了这个问题,这是一种从单视图卫星图像中检测3D平面屋顶截面的方法。通过利用人类操作员产生的大规模LOD2数据库,利用高效的神经结构,我们成功地在图像中分割屋顶截面,并在3D中提取包围这些截面的关键点,从而形成低复杂度的3D多边形。当与平面装配方法相结合时,3d多边形的输出集可用于重建建筑物的LOD2模型。虽然概念上很简单,但我们的方法仅通过学习图像中包含的间接3D信息,特别是视图倾角、立面变形、建筑阴影、屋顶峰值和山脊视角,就能从单张卫星图像中以良好的精度捕获屋顶剖面的3D多边形。通过在几分钟内重建大型城市区域,我们展示了KIBS的潜力,单个屋顶部分的二维分割的Jaccard指数约为80%,重建的LOD2模型的高差小于2米。
{"title":"KIBS: 3D detection of planar roof sections from a single satellite image","authors":"Johann Lussange , Mulin Yu , Yuliya Tarabalka , Florent Lafarge","doi":"10.1016/j.isprsjprs.2024.11.014","DOIUrl":"10.1016/j.isprsjprs.2024.11.014","url":null,"abstract":"<div><div>Reconstructing urban areas in 3D from satellite raster images has been a long-standing problem for both academical and industrial research. While automatic methods achieving this objective at a Level Of Detail (LOD) 1 are mostly efficient today, producing LOD2 models is still a scientific challenge. In particular, the quality and resolution of satellite data is too low to infer accurately the planar roof sections in 3D by using traditional plane detection algorithms. Existing methods rely upon the exploitation of both strong urban priors that reduce their applicability to a variety of environments and multi-modal data, including some derived 3D products such as Digital Surface Models. In this work, we address this issue with KIBS (<em>Keypoints Inference By Segmentation</em>), a method that detects planar roof sections in 3D from a single-view satellite image. By exploiting large-scale LOD2 databases produced by human operators with efficient neural architectures, we manage to both segment roof sections in images and extract keypoints enclosing these sections in 3D to form 3D-polygons with a low-complexity. The output set of 3D-polygons can be used to reconstruct LOD2 models of buildings when combined with a plane assembly method. While conceptually simple, our method manages to capture roof sections as 3D-polygons with a good accuracy, from a single satellite image only by learning indirect 3D information contained in the image, in particular from the view inclination, the distortion of facades, the building shadows, roof peak and ridge perspective. We demonstrate the potential of KIBS by reconstructing large urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of approximately 80%, and an altimetric error of the reconstructed LOD2 model of less than to 2 meters.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 207-216"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}