ISPRS Journal of Photogrammetry and Remote Sensing最新文献_第7页

Contribution of ECOSTRESS thermal imagery to wetland mapping: Application to heathland ecosystems

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.014

Liam Loizeau-Woollgar , Sébastien Rapinel , Julien Pellen , Bernard Clément , Laurence Hubert-Moy

While wetlands have been extensively studied using optical and radar satellite imagery, thermal imagery has been used less often due its low spatial – temporal resolutions and challenges for emissivity estimation. Since 2018, spaceborne thermal imagery has gained interest due to the availability of ECOSTRESS data, which are acquired at 70 m spatial resolution and a 3–5 revisit time. This study aimed at comparing the contribution of ECOSTRESS time-series to wetland mapping to that of other thermal time-series (i.e., Landsat-TIRS, ASTER-TIR), Sentinel-1 SAR and Sentinel-2 optical satellite time-series, and topographical variables derived from satellite data. The study was applied to a 209 km² heathland site in north-western France that includes riverine, slope, and flat wetlands. The method used consisted of four steps: (i) four-year time-series (2019–2022) were aggregated into dense annual time-series; (ii) the temporal dimension was reduced using functional principal component analysis (FPCA); (iii) the most discriminating components of the FPCA were selected based on recursive feature elimination; and (iv) the contribution of each sensor time-series to wetland mapping was assessed based on the accuracy of a random forest model trained and tested using reference field data. The results indicated that an ECOSTRESS time-series that combined day and night acquisitions was more accurate (overall F1-score: 0.71) than Landsat-TIRS and ASTER-TIR time-series (overall F1-score: 0.40–0.62). A combination of ECOSTRESS thermal images, Sentinel-2 optical images, Sentinel-1 SAR images, and topographical variables outperformed the sensor-specific accuracies (overall F1-score: 0.87), highlighting the synergy of thermal, optical, and topographical data for wetland mapping.

{"title":"Contribution of ECOSTRESS thermal imagery to wetland mapping: Application to heathland ecosystems","authors":"Liam Loizeau-Woollgar , Sébastien Rapinel , Julien Pellen , Bernard Clément , Laurence Hubert-Moy","doi":"10.1016/j.isprsjprs.2025.01.014","DOIUrl":"10.1016/j.isprsjprs.2025.01.014","url":null,"abstract":"<div><div>While wetlands have been extensively studied using optical and radar satellite imagery, thermal imagery has been used less often due its low spatial – temporal resolutions and challenges for emissivity estimation. Since 2018, spaceborne thermal imagery has gained interest due to the availability of ECOSTRESS data, which are acquired at 70 m spatial resolution and a 3–5 revisit time. This study aimed at comparing the contribution of ECOSTRESS time-series to wetland mapping to that of other thermal time-series (i.e., Landsat-TIRS, ASTER-TIR), Sentinel-1 SAR and Sentinel-2 optical satellite time-series, and topographical variables derived from satellite data. The study was applied to a 209 km<sup>2</sup> heathland site in north-western France that includes riverine, slope, and flat wetlands. The method used consisted of four steps: (i) four-year time-series (2019–2022) were aggregated into dense annual time-series; (ii) the temporal dimension was reduced using functional principal component analysis (FPCA); (iii) the most discriminating components of the FPCA were selected based on recursive feature elimination; and (iv) the contribution of each sensor time-series to wetland mapping was assessed based on the accuracy of a random forest model trained and tested using reference field data. The results indicated that an ECOSTRESS time-series that combined day and night acquisitions was more accurate (overall F1-score: 0.71) than Landsat-TIRS and ASTER-TIR time-series (overall F1-score: 0.40–0.62). A combination of ECOSTRESS thermal images, Sentinel-2 optical images, Sentinel-1 SAR images, and topographical variables outperformed the sensor-specific accuracies (overall F1-score: 0.87), highlighting the synergy of thermal, optical, and topographical data for wetland mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 649-660"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning transferable land cover semantics for open vocabulary interactions with remote sensing images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.006

Valérie Zermatten , Javiera Castillo-Navarro , Diego Marcos , Devis Tuia

Why should we confine land cover classes to rigid and arbitrary definitions? Land cover mapping is a central task in remote sensing image processing, but the rigorous class definitions can sometimes restrict the transferability of annotations between datasets. Open vocabulary recognition, i.e. using natural language to define a specific object or pattern in an image, breaks free from predefined nomenclature and offers flexible recognition of diverse categories with a more general image understanding across datasets and labels. The open vocabulary framework opens doors to search for concepts of interest, beyond individual class boundaries. In this work, we propose to use Text As supervision for COntrastive Semantic Segmentation (TACOSS), and we design an open vocabulary semantic segmentation model that extends its capacities beyond that of a traditional model for land cover mapping: In addition to visual pattern recognition, TACOSS leverages the common sense knowledge captured by language models and is capable of interpreting the image at the pixel level, attributing semantics to each pixel and removing the constraints of a fixed set of land cover labels. By learning to match visual representations with text embeddings, TACOSS can transition smoothly from one set of labels to another and enables the interaction with remote sensing images in natural language. Our approach combines a pretrained text encoder with a visual encoder and adopts supervised contrastive learning to align the visual and textual modalities. We explore several text encoders and label representation methods and compare their abilities to encode transferable land cover semantics. The model’s capacity to predict a set of different land cover labels on an unseen dataset is also explored to illustrate the generalization capacities across domains of our approach. Overall, TACOSS is a general method and permits adapting between different sets of land cover labels with minimal computational overhead. Code is publicly available online¹.

{"title":"Learning transferable land cover semantics for open vocabulary interactions with remote sensing images","authors":"Valérie Zermatten , Javiera Castillo-Navarro , Diego Marcos , Devis Tuia","doi":"10.1016/j.isprsjprs.2025.01.006","DOIUrl":"10.1016/j.isprsjprs.2025.01.006","url":null,"abstract":"<div><div>Why should we confine land cover classes to rigid and arbitrary definitions? Land cover mapping is a central task in remote sensing image processing, but the rigorous class definitions can sometimes restrict the transferability of annotations between datasets. Open vocabulary recognition, i.e. using natural language to define a specific object or pattern in an image, breaks free from predefined nomenclature and offers flexible recognition of diverse categories with a more general image understanding across datasets and labels. The open vocabulary framework opens doors to search for concepts of interest, beyond individual class boundaries. In this work, we propose to use Text As supervision for COntrastive Semantic Segmentation (TACOSS), and we design an open vocabulary semantic segmentation model that extends its capacities beyond that of a traditional model for land cover mapping: In addition to visual pattern recognition, TACOSS leverages the common sense knowledge captured by language models and is capable of interpreting the image at the pixel level, attributing semantics to each pixel and removing the constraints of a fixed set of land cover labels. By learning to match visual representations with text embeddings, TACOSS can transition smoothly from one set of labels to another and enables the interaction with remote sensing images in natural language. Our approach combines a pretrained text encoder with a visual encoder and adopts supervised contrastive learning to align the visual and textual modalities. We explore several text encoders and label representation methods and compare their abilities to encode transferable land cover semantics. The model’s capacity to predict a set of different land cover labels on an unseen dataset is also explored to illustrate the generalization capacities across domains of our approach. Overall, TACOSS is a general method and permits adapting between different sets of land cover labels with minimal computational overhead. Code is publicly available online<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 621-636"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143035307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A coupled optical–radiometric modeling approach to removing reflection noise in TLS data of urban areas 一种去除城市区域TLS数据反射噪声的耦合光辐射建模方法

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.005

Li Fang , Tianyu Li , Yanghong Lin , Shudong Zhou , Wei Yao

Point clouds, which are a fundamental type of 3D data, play an essential role in various applications like 3D reconstruction, autonomous driving, and robotics. However, point clouds generated via measuring the time-of-flight of emitted and backscattered laser pulses of TLS, frequently include false points caused by mirror-like reflective surfaces, resulting in degradation of data quality and fidelity. This study introduces an algorithm to eliminate reflection noise from TLS scan data. Our novel algorithm detects reflection planes by utilizing both geometric and physical characteristics to recognize reflection points according to optical reflection theory. Radiometric correction is applied to the raw laser intensity, after which reflective planes are extracted using a threshold. In the virtual points identification phase, these points are detected along the light propagation path, grounded on the specular reflection principle. Moreover, an improved feature descriptor, known as RE-LFSH, is employed to assess the similarity between two points in terms of reflection symmetry. We have adapted the LFSH feature descriptor to retain reflection features, mitigating interference from symmetrical architectural structures. Incorporating the Hausdorff feature distance into the algorithm fortifies its resistance to ghosting and deformations, thereby boosting the accuracy of virtual point detection. Additionally, to overcome the shortage of annotated datasets, a novel benchmark dataset named 3DRN, specifically designed for this task, is introduced. Extensive experiments on the 3DRN benchmark dataset, featuring diverse urban environments with virtual TLS reflection noise, show our algorithm improves precision and recall rates for 3D points in reflective areas by 57.03% and 31.80%, respectively. Our approach improves outlier detection by 9.17% and enhances accuracy by 5.65% compared to leading methods. You can access the 3DRN dataset at https://github.com/Tsuiky/3DRN.

点云是一种基本的3D数据类型，在3D重建、自动驾驶和机器人等各种应用中发挥着至关重要的作用。然而，通过测量TLS发射和后向散射激光脉冲的飞行时间产生的点云，往往包含镜面反射表面造成的假点，导致数据质量和保真度下降。介绍了一种消除TLS扫描数据反射噪声的算法。该算法根据光学反射理论，利用反射平面的几何特征和物理特征来识别反射点。对原始激光强度进行辐射校正，然后使用阈值提取反射面。在虚拟点识别阶段，基于镜面反射原理，沿着光传播路径检测虚拟点。此外，采用一种改进的RE-LFSH特征描述符，根据反射对称来评估两点之间的相似性。我们对LFSH特性描述符进行了调整，以保留反射特性，减轻对称架构结构的干扰。在算法中加入Hausdorff特征距离，增强了算法对重影和变形的抵抗能力，从而提高了虚拟点检测的精度。此外，为了克服标注数据集的不足，本文引入了专门为该任务设计的新型基准数据集3DRN。在具有虚拟TLS反射噪声的不同城市环境的3DRN基准数据集上进行的大量实验表明，我们的算法将反射区域中3D点的精度和召回率分别提高了57.03%和31.80%。与现有方法相比，该方法的异常点检测率提高了9.17%，准确率提高了5.65%。您可以通过https://github.com/Tsuiky/3DRN访问3DRN数据集。

{"title":"A coupled optical–radiometric modeling approach to removing reflection noise in TLS data of urban areas","authors":"Li Fang , Tianyu Li , Yanghong Lin , Shudong Zhou , Wei Yao","doi":"10.1016/j.isprsjprs.2024.12.005","DOIUrl":"10.1016/j.isprsjprs.2024.12.005","url":null,"abstract":"<div><div>Point clouds, which are a fundamental type of 3D data, play an essential role in various applications like 3D reconstruction, autonomous driving, and robotics. However, point clouds generated via measuring the time-of-flight of emitted and backscattered laser pulses of TLS, frequently include false points caused by mirror-like reflective surfaces, resulting in degradation of data quality and fidelity. This study introduces an algorithm to eliminate reflection noise from TLS scan data. Our novel algorithm detects reflection planes by utilizing both geometric and physical characteristics to recognize reflection points according to optical reflection theory. Radiometric correction is applied to the raw laser intensity, after which reflective planes are extracted using a threshold. In the virtual points identification phase, these points are detected along the light propagation path, grounded on the specular reflection principle. Moreover, an improved feature descriptor, known as RE-LFSH, is employed to assess the similarity between two points in terms of reflection symmetry. We have adapted the LFSH feature descriptor to retain reflection features, mitigating interference from symmetrical architectural structures. Incorporating the Hausdorff feature distance into the algorithm fortifies its resistance to ghosting and deformations, thereby boosting the accuracy of virtual point detection. Additionally, to overcome the shortage of annotated datasets, a novel benchmark dataset named 3DRN, specifically designed for this task, is introduced. Extensive experiments on the 3DRN benchmark dataset, featuring diverse urban environments with virtual TLS reflection noise, show our algorithm improves precision and recall rates for 3D points in reflective areas by 57.03% and 31.80%, respectively. Our approach improves outlier detection by 9.17% and enhances accuracy by 5.65% compared to leading methods. You can access the 3DRN dataset at <span><span>https://github.com/Tsuiky/3DRN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 217-231"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data fidelity-oriented spatial-spectral fusion of CRISM and CTX images 面向数据保真度的CRISM与CTX图像空间光谱融合

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.004

Qunming Wang , Wenjing Ma , Sicong Liu , Xiaohua Tong , Peter M. Atkinson

The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) is a Mars-dedicated compact reconnaissance imaging spectrometer that captures remote sensing data with very fine spectral resolution. However, the spatial resolution of CRISM data is relatively coarse (18 m), limiting its application to regional scales. The Context Camera (CTX) is a digital camera equipped with a wide-angle lens, providing a finer spatial resolution (6 m) and larger field-of-view, but CTX provides only a single panchromatic band. To produce CRISM hyperspectral data with finer spatial resolution (e.g., 6 m of CTX images), this research investigated spatial-spectral fusion of 18 m CRISM images with 6 m CTX panchromatic images. In spatial-spectral fusion, to address the long-standing issue of incomplete data fidelity to the original hyperspectral data in existing methods, a new paradigm called Data Fidelity-oriented Spatial-Spectral Fusion (DF-SSF) was proposed. The effectiveness of DF-SSF was validated through experiments on data from six areas on Mars. The results indicate that the fusion of CRISM and CTX can increase the spatial resolution of CRISM hyperspectral data effectively. Moreover, DF-SSF can increase the fusion accuracy noticeably while maintaining perfect data fidelity to the original hyperspectral data. In addition, DF-SSF is theoretically applicable to any existing spatial-spectral fusion methods. The 6 m CRISM hyperspectral data inherit the advantages of the original 18 m data in spectral resolution, and provide richer spatial texture information on the Martian surface, with broad application potential.

紧凑型火星侦察成像光谱仪（CRISM）是一种火星专用的紧凑型侦察成像光谱仪，能够以非常精细的光谱分辨率捕获遥感数据。然而，CRISM数据的空间分辨率较粗（18 m），限制了其在区域尺度上的应用。背景相机（CTX）是一种配备广角镜头的数码相机，提供更精细的空间分辨率（6米）和更大的视野，但CTX只提供单一的全色波段。为了获得更精细的空间分辨率（如6 m CTX图像）的CRISM高光谱数据，本研究对18 m CRISM图像与6 m CTX全色图像进行了空间光谱融合。在空间-光谱融合中，针对现有方法对原始高光谱数据保真度不高的问题，提出了一种面向数据保真度的空间-光谱融合（DF-SSF）新范式。通过对火星六个区域的数据进行实验，验证了DF-SSF的有效性。结果表明，CRISM与CTX的融合可以有效提高CRISM高光谱数据的空间分辨率。此外，DF-SSF在保持对原始高光谱数据的良好保真度的同时，可以显著提高融合精度。此外，DF-SSF在理论上适用于任何现有的空间光谱融合方法。6 m CRISM高光谱数据继承了原18 m数据在光谱分辨率上的优势，提供了更丰富的火星表面空间纹理信息，具有广阔的应用潜力。

{"title":"Data fidelity-oriented spatial-spectral fusion of CRISM and CTX images","authors":"Qunming Wang , Wenjing Ma , Sicong Liu , Xiaohua Tong , Peter M. Atkinson","doi":"10.1016/j.isprsjprs.2024.12.004","DOIUrl":"10.1016/j.isprsjprs.2024.12.004","url":null,"abstract":"<div><div>The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) is a Mars-dedicated compact reconnaissance imaging spectrometer that captures remote sensing data with very fine spectral resolution. However, the spatial resolution of CRISM data is relatively coarse (18 m), limiting its application to regional scales. The Context Camera (CTX) is a digital camera equipped with a wide-angle lens, providing a finer spatial resolution (6 m) and larger field-of-view, but CTX provides only a single panchromatic band. To produce CRISM hyperspectral data with finer spatial resolution (e.g., 6 m of CTX images), this research investigated spatial-spectral fusion of 18 m CRISM images with 6 m CTX panchromatic images. In spatial-spectral fusion, to address the long-standing issue of incomplete data fidelity to the original hyperspectral data in existing methods, a new paradigm called Data Fidelity-oriented Spatial-Spectral Fusion (DF-SSF) was proposed. The effectiveness of DF-SSF was validated through experiments on data from six areas on Mars. The results indicate that the fusion of CRISM and CTX can increase the spatial resolution of CRISM hyperspectral data effectively. Moreover, DF-SSF can increase the fusion accuracy noticeably while maintaining perfect data fidelity to the original hyperspectral data. In addition, DF-SSF is theoretically applicable to any existing spatial-spectral fusion methods. The 6 m CRISM hyperspectral data inherit the advantages of the original 18 m data in spectral resolution, and provide richer spatial texture information on the Martian surface, with broad application potential.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 172-191"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Remote sensing scene graph generation for improved retrieval based on spatial relationships

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2025.01.012

Jiayi Tang, Xiaochong Tong, Chunping Qiu, Yuekun Sun, Haoshuai Song, Yaxian Lei, Yi Lei, Congzhou Guo

RS scene graphs represent RS scenes as graphs with objects as nodes and their spatial relationships as edges, playing a crucial role in understanding and interpreting RS scenes at a higher level. However, existing RS scene graph generation methods, relying on deep learning models, face limitations due to their dependence on extensive relationship labels, restricted generation accuracy, and limited generalizability. To address these challenges, we proposed a spatial relationship computing model based on prior geographic information knowledge for RS scene graph generation. We refer to the RS scene graph generated using our method as SG-SSR for short. Furthermore, we investigated the application of SG-SSR in RS scene retrieval, demonstrating improved retrieval accuracy for spatial relationships between entities. The experiments show that our scene graph generation method does not rely on relationship labels, and has higher generation accuracy and greater universality. Moreover, the retrieval method based on SG-SSR outperformed other retrieval methods based on image feature vectors, with a retrieval accuracy index 0.098 higher than the alternatives(RemoteCLIP(mask)). The dataset and code are available at https://gitee.com/tangjiayitangjiayi/sg-ssr.

{"title":"Remote sensing scene graph generation for improved retrieval based on spatial relationships","authors":"Jiayi Tang, Xiaochong Tong, Chunping Qiu, Yuekun Sun, Haoshuai Song, Yaxian Lei, Yi Lei, Congzhou Guo","doi":"10.1016/j.isprsjprs.2025.01.012","DOIUrl":"10.1016/j.isprsjprs.2025.01.012","url":null,"abstract":"<div><div>RS scene graphs represent RS scenes as graphs with objects as nodes and their spatial relationships as edges, playing a crucial role in understanding and interpreting RS scenes at a higher level. However, existing RS scene graph generation methods, relying on deep learning models, face limitations due to their dependence on extensive relationship labels, restricted generation accuracy, and limited generalizability. To address these challenges, we proposed a spatial relationship computing model based on prior geographic information knowledge for RS scene graph generation. We refer to the RS scene graph generated using our method as SG-SSR for short. Furthermore, we investigated the application of SG-SSR in RS scene retrieval, demonstrating improved retrieval accuracy for spatial relationships between entities. The experiments show that our scene graph generation method does not rely on relationship labels, and has higher generation accuracy and greater universality. Moreover, the retrieval method based on SG-SSR outperformed other retrieval methods based on image feature vectors, with a retrieval accuracy index 0.098 higher than the alternatives(RemoteCLIP(mask)). The dataset and code are available at <span><span>https://gitee.com/tangjiayitangjiayi/sg-ssr</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 741-752"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiscale adaptive PolSAR image superpixel generation based on local iterative clustering and polarimetric scattering features 基于局部迭代聚类和偏振散射特征的多尺度自适应PolSAR图像超像元生成

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.011

Nengcai Li , Deliang Xiang , Xiaokun Sun , Canbin Hu , Yi Su

Superpixel generation is an essential preprocessing step for intelligent interpretation of object-level Polarimetric Synthetic Aperture Radar (PolSAR) images. The Simple Linear Iterative Clustering (SLIC) algorithm has become one of the primary methods for superpixel generation in PolSAR images due to its advantages of minimal human intervention and ease of implementation. However, existing SLIC-based superpixel generation methods for PolSAR images often use distance measures based on the complex Wishart distribution as the similarity metric. These methods are not ideal for segmenting heterogeneous regions, and a single superpixel generation result cannot simultaneously extract coarse and fine levels of detail in the image. To address this, this paper proposes a multiscale adaptive superpixel generation method for PolSAR images based on SLIC. To tackle the issue of the complex Wishart distribution’s inaccuracy in modeling urban heterogeneous regions, this paper employs the polarimetric target decomposition method. It extracts the polarimetric scattering features of the land cover, then constructs a similarity measure for these features using Riemannian metric. To achieve multiscale superpixel segmentation in a single superpixel segmentation process, this paper introduces a new method for initializing cluster centers based on polarimetric homogeneity measure. This initialization method assigns denser cluster centers in heterogeneous areas and automatically adjusts the size of the search regions according to the polarimetric homogeneity measure. Finally, a novel clustering distance metric is defined, integrating multiple types of information, including polarimetric scattering feature similarity, power feature similarity, and spatial similarity. This metric uses the polarimetric homogeneity measure to adaptively balance the relative weights between the various similarities. Comparative experiments were conducted using three real PolSAR datasets with state-of-the-art SLIC-based methods (Qin-RW and Yin-HLT). The results demonstrate that the proposed method provides richer multiscale detail information and significantly improves segmentation outcomes. For example, with the AIRSAR dataset and the step size of 42, the proposed method achieves improvements of 16.56

%

in BR and 12.01

%

in ASA compared to the Qin-RW method. Source code of the proposed method is made available at https://github.com/linengcai/PolSAR_MS_ASLIC.git.

超像素生成是目标级偏振合成孔径雷达（PolSAR）图像智能解译的重要预处理步骤。简单线性迭代聚类（Simple Linear Iterative Clustering， SLIC）算法以其人为干预最小、易于实现等优点成为PolSAR图像超像素生成的主要方法之一。然而，现有的基于slic的PolSAR图像超像素生成方法通常使用基于复Wishart分布的距离度量作为相似性度量。这些方法对于非均匀区域的分割并不理想，并且单个超像素生成结果不能同时提取图像中的粗、细细节。针对这一问题，提出了一种基于SLIC的PolSAR图像多尺度自适应超像素生成方法。为解决复杂Wishart分布在城市异质区域建模中的不准确性问题，本文采用了极化目标分解方法。提取地表覆盖物的极化散射特征，利用黎曼度量构造相似度度量。为了在单个超像素分割过程中实现多尺度超像素分割，提出了一种基于偏振均匀性测度的聚类中心初始化方法。该初始化方法在非均匀区域中分配密度较大的聚类中心，并根据极化均匀性度量自动调整搜索区域的大小。最后，结合极化散射特征相似度、功率特征相似度和空间相似度等多种信息，定义了一种新的聚类距离度量。该度量使用极化均匀性度量自适应地平衡各种相似性之间的相对权重。对比实验使用三个真实的PolSAR数据集，采用最先进的基于slic的方法（Qin-RW和Yin-HLT）。结果表明，该方法提供了更丰富的多尺度细节信息，显著提高了分割效果。以AIRSAR数据集为例，在步长为42的情况下，与Qin-RW方法相比，该方法在BR和ASA方面分别提高了16.56%和12.01%。建议的方法的源代码可在https://github.com/linengcai/PolSAR_MS_ASLIC.git上获得。

{"title":"Multiscale adaptive PolSAR image superpixel generation based on local iterative clustering and polarimetric scattering features","authors":"Nengcai Li , Deliang Xiang , Xiaokun Sun , Canbin Hu , Yi Su","doi":"10.1016/j.isprsjprs.2024.12.011","DOIUrl":"10.1016/j.isprsjprs.2024.12.011","url":null,"abstract":"<div><div>Superpixel generation is an essential preprocessing step for intelligent interpretation of object-level Polarimetric Synthetic Aperture Radar (PolSAR) images. The Simple Linear Iterative Clustering (SLIC) algorithm has become one of the primary methods for superpixel generation in PolSAR images due to its advantages of minimal human intervention and ease of implementation. However, existing SLIC-based superpixel generation methods for PolSAR images often use distance measures based on the complex Wishart distribution as the similarity metric. These methods are not ideal for segmenting heterogeneous regions, and a single superpixel generation result cannot simultaneously extract coarse and fine levels of detail in the image. To address this, this paper proposes a multiscale adaptive superpixel generation method for PolSAR images based on SLIC. To tackle the issue of the complex Wishart distribution’s inaccuracy in modeling urban heterogeneous regions, this paper employs the polarimetric target decomposition method. It extracts the polarimetric scattering features of the land cover, then constructs a similarity measure for these features using Riemannian metric. To achieve multiscale superpixel segmentation in a single superpixel segmentation process, this paper introduces a new method for initializing cluster centers based on polarimetric homogeneity measure. This initialization method assigns denser cluster centers in heterogeneous areas and automatically adjusts the size of the search regions according to the polarimetric homogeneity measure. Finally, a novel clustering distance metric is defined, integrating multiple types of information, including polarimetric scattering feature similarity, power feature similarity, and spatial similarity. This metric uses the polarimetric homogeneity measure to adaptively balance the relative weights between the various similarities. Comparative experiments were conducted using three real PolSAR datasets with state-of-the-art SLIC-based methods (Qin-RW and Yin-HLT). The results demonstrate that the proposed method provides richer multiscale detail information and significantly improves segmentation outcomes. For example, with the AIRSAR dataset and the step size of 42, the proposed method achieves improvements of 16.56<span><math><mtext>%</mtext></math></span> in BR and 12.01<span><math><mtext>%</mtext></math></span> in ASA compared to the Qin-RW method. Source code of the proposed method is made available at <span><span>https://github.com/linengcai/PolSAR_MS_ASLIC.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 307-322"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate and complete neural implicit surface reconstruction in street scenes using images and LiDAR point clouds 使用图像和LiDAR点云在街道场景中精确和完整的神经隐式表面重建

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.012

Chenhui Shi , Fulin Tang , Yihong Wu , Hongtu Ji , Hongjie Duan

Surface reconstruction in street scenes is a critical task in computer vision and photogrammetry, with images and LiDAR point clouds being commonly used data sources. However, image-only reconstruction faces challenges such as lighting variations, weak textures, and sparse viewpoints, while LiDAR-only methods suffer from issues like sparse and noisy LiDAR point clouds. Effectively integrating these two modalities to leverage their complementary strengths remains an open problem. Inspired by recent advances in neural implicit representations, we propose a novel street-level neural implicit surface reconstruction approach that incorporates images and LiDAR point clouds into a unified framework for joint optimization. Three key components make our approach achieve state-of-the-art (SOTA) reconstruction performance with high accuracy and completeness in street scenes. First, we introduce an adaptive photometric constraint weighting method to mitigate the impacts of lighting variations and weak textures on reconstruction. Second, a new B-spline-based hierarchical hash encoder is proposed to ensure the continuity of gradient-derived normals and further to reduce the noise from images and LiDAR point clouds. Third, we implement effective signed distance field (SDF) constraints in a spatial hash grid allocated in near-surface space to fully exploit the geometric information provided by LiDAR point clouds. Additionally, we present two street-level datasets—one virtual and one real-world—offering a comprehensive set of resources that existing public datasets lack. Experimental results demonstrate the superior performance of our method. Compared to the SOTA image-LiDAR combined neural implicit method, namely StreetSurf, ours significantly improves the F-score by approximately 7 percentage points. Our code and data are available at https://github.com/SCH1001/StreetRecon.

街道场景的表面重建是计算机视觉和摄影测量中的一项关键任务，图像和激光雷达点云是常用的数据源。然而，纯图像重建面临着诸如光照变化、弱纹理和稀疏视点等挑战，而纯激光雷达重建方法则面临着诸如稀疏和嘈杂的激光雷达点云等问题。有效整合这两种模式，发挥其互补优势，仍然是一个悬而未决的问题。受神经隐式表示的最新进展的启发，我们提出了一种新的街道级神经隐式表面重建方法，该方法将图像和LiDAR点云合并到一个统一的框架中进行联合优化。三个关键组件使我们的方法在街道场景中实现高精度和完整性的最先进（SOTA）重建性能。首先，我们引入了一种自适应光度约束加权方法，以减轻光照变化和弱纹理对重建的影响。其次，提出了一种新的基于b样条的分层哈希编码器，以保证梯度法线的连续性，并进一步降低图像和激光雷达点云的噪声。第三，在近地表空间分配的空间哈希网格中实现有效的签名距离场（SDF）约束，充分利用激光雷达点云提供的几何信息。此外，我们提供了两个街道级别的数据集——一个虚拟的和一个现实世界的——提供了现有公共数据集所缺乏的一套全面的资源。实验结果证明了该方法的优越性。与SOTA图像- lidar结合的神经隐式方法（StreetSurf）相比，我们的方法显著提高了约7个百分点的f分数。我们的代码和数据可在https://github.com/SCH1001/StreetRecon上获得。

{"title":"Accurate and complete neural implicit surface reconstruction in street scenes using images and LiDAR point clouds","authors":"Chenhui Shi , Fulin Tang , Yihong Wu , Hongtu Ji , Hongjie Duan","doi":"10.1016/j.isprsjprs.2024.12.012","DOIUrl":"10.1016/j.isprsjprs.2024.12.012","url":null,"abstract":"<div><div>Surface reconstruction in street scenes is a critical task in computer vision and photogrammetry, with images and LiDAR point clouds being commonly used data sources. However, image-only reconstruction faces challenges such as lighting variations, weak textures, and sparse viewpoints, while LiDAR-only methods suffer from issues like sparse and noisy LiDAR point clouds. Effectively integrating these two modalities to leverage their complementary strengths remains an open problem. Inspired by recent advances in neural implicit representations, we propose a novel street-level neural implicit surface reconstruction approach that incorporates images and LiDAR point clouds into a unified framework for joint optimization. Three key components make our approach achieve state-of-the-art (SOTA) reconstruction performance with high accuracy and completeness in street scenes. First, we introduce an adaptive photometric constraint weighting method to mitigate the impacts of lighting variations and weak textures on reconstruction. Second, a new B-spline-based hierarchical hash encoder is proposed to ensure the continuity of gradient-derived normals and further to reduce the noise from images and LiDAR point clouds. Third, we implement effective signed distance field (SDF) constraints in a spatial hash grid allocated in near-surface space to fully exploit the geometric information provided by LiDAR point clouds. Additionally, we present two street-level datasets—one virtual and one real-world—offering a comprehensive set of resources that existing public datasets lack. Experimental results demonstrate the superior performance of our method. Compared to the SOTA image-LiDAR combined neural implicit method, namely StreetSurf, ours significantly improves the F-score by approximately 7 percentage points. Our code and data are available at <span><span>https://github.com/SCH1001/StreetRecon</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 295-306"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142889392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesis of complex-valued InSAR data with a multi-task convolutional neural network 基于多任务卷积神经网络的复杂InSAR数据综合

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.007

Philipp Sibler , Francescopaolo Sica , Michael Schmitt

Simulated remote sensing images bear great potential for many applications in the field of Earth observation. They can be used as controlled testbed for the development of signal and image processing algorithms or can provide a means to get an impression of the potential of new sensor concepts. With the rise of deep learning, the synthesis of artificial remote sensing images by means of deep neural networks has become a hot research topic. While the generation of optical data is relatively straightforward, as it can rely on the use of established models from the computer vision community, the generation of synthetic aperture radar (SAR) data until now is still largely restricted to intensity images since the processing of complex-valued numbers by conventional neural networks poses significant challenges. With this work, we propose to circumvent these challenges by decomposing SAR interferograms into real-valued components. These components are then simultaneously synthesized by different branches of a multi-branch encoder–decoder network architecture. In the end, these real-valued components can be combined again into the final, complex-valued interferogram. Moreover, the effect of speckle and interferometric phase noise is replicated and applied to the synthesized interferometric data. Experimental results on both medium-resolution C-band repeat-pass SAR data and high-resolution X-band single-pass SAR data, demonstrate the general feasibility of the approach.

模拟遥感影像在对地观测领域具有广阔的应用前景。它们可以用作开发信号和图像处理算法的受控试验台，或者可以提供一种方法来了解新传感器概念的潜力。随着深度学习的兴起，利用深度神经网络合成人工遥感图像已成为一个研究热点。虽然光学数据的生成相对简单，因为它可以依赖于计算机视觉社区已建立的模型的使用，但合成孔径雷达（SAR）数据的生成到目前为止仍然主要局限于强度图像，因为传统神经网络处理复数值带来了重大挑战。通过这项工作，我们建议通过将SAR干涉图分解为实值分量来规避这些挑战。然后，这些组件由多分支编码器-解码器网络架构的不同分支同时合成。最后，这些实值分量可以再次组合成最终的复值干涉图。此外，将散斑和干涉相位噪声的影响复制并应用到合成干涉数据中。在中分辨率c波段重复通SAR数据和高分辨率x波段单通SAR数据上的实验结果表明了该方法的总体可行性。

{"title":"Synthesis of complex-valued InSAR data with a multi-task convolutional neural network","authors":"Philipp Sibler , Francescopaolo Sica , Michael Schmitt","doi":"10.1016/j.isprsjprs.2024.12.007","DOIUrl":"10.1016/j.isprsjprs.2024.12.007","url":null,"abstract":"<div><div>Simulated remote sensing images bear great potential for many applications in the field of Earth observation. They can be used as controlled testbed for the development of signal and image processing algorithms or can provide a means to get an impression of the potential of new sensor concepts. With the rise of deep learning, the synthesis of artificial remote sensing images by means of deep neural networks has become a hot research topic. While the generation of optical data is relatively straightforward, as it can rely on the use of established models from the computer vision community, the generation of synthetic aperture radar (SAR) data until now is still largely restricted to intensity images since the processing of complex-valued numbers by conventional neural networks poses significant challenges. With this work, we propose to circumvent these challenges by decomposing SAR interferograms into real-valued components. These components are then simultaneously synthesized by different branches of a multi-branch encoder–decoder network architecture. In the end, these real-valued components can be combined again into the final, complex-valued interferogram. Moreover, the effect of speckle and interferometric phase noise is replicated and applied to the synthesized interferometric data. Experimental results on both medium-resolution C-band repeat-pass SAR data and high-resolution X-band single-pass SAR data, demonstrate the general feasibility of the approach.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 192-206"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-scale rice mapping under spatiotemporal heterogeneity using multi-temporal SAR images and explainable deep learning 时空异质性下基于多时相SAR图像和可解释深度学习的大尺度水稻制图

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.12.021

Ji Ge , Hong Zhang , Lijun Zuo , Lu Xu , Jingling Jiang , Mingyang Song , Yinhaibin Ding , Yazhe Xie , Fan Wu , Chao Wang , Wenjiang Huang

Timely and accurate mapping of rice cultivation distribution is crucial for ensuring global food security and achieving SDG2. From a global perspective, rice areas display high heterogeneity in spatial pattern and SAR time-series characteristics, posing substantial challenges to deep learning (DL) models’ performance, efficiency, and transferability. Moreover, due to their “black box” nature, DL often lack interpretability and credibility. To address these challenges, this paper constructs the first SAR rice dataset with spatiotemporal heterogeneity and proposes an explainable, lightweight model for rice area extraction, the eXplainable Mamba UNet (XM-UNet). The dataset is based on the 2023 multi-temporal Sentinel-1 data, covering diverse rice samples from the United States, Kenya, and Vietnam. A Temporal Feature Importance Explainer (TFI-Explainer) based on the Selective State Space Model is designed to enhance adaptability to the temporal heterogeneity of rice and the model’s interpretability. This explainer, coupled with the DL model, provides interpretations of the importance of SAR temporal features and facilitates crucial time phase screening. To overcome the spatial heterogeneity of rice, an Attention Sandglass Layer (ASL) combining CNN and self-attention mechanisms is designed to enhance the local spatial feature extraction capabilities. Additionally, the Parallel Visual State Space Layer (PVSSL) utilizes 2D-Selective-Scan (SS2D) cross-scanning to capture the global spatial features of rice multi-directionally, significantly reducing computational complexity through parallelization. Experimental results demonstrate that the XM-UNet adapts well to the spatiotemporal heterogeneity of rice globally, with OA and F1-score of 94.26 % and 90.73 %, respectively. The model is extremely lightweight, with only 0.190 M parameters and 0.279 GFLOPs. Mamba’s selective scanning facilitates feature screening, and its integration with CNN effectively balances rice’s local and global spatial characteristics. The interpretability experiments prove that the explanations of the importance of the temporal features provided by the model are crucial for guiding rice distribution mapping and filling a gap in the related field. The code is available in https://github.com/SAR-RICE/XM-UNet.

及时准确地绘制水稻种植分布图对于确保全球粮食安全和实现可持续发展目标2至关重要。从全球范围来看，水稻区域在空间格局和SAR时间序列特征上具有高度异质性，这对深度学习（DL）模型的性能、效率和可移植性提出了重大挑战。此外，由于其“黑箱”性质，深度学习往往缺乏可解释性和可信度。为了解决这些问题，本文构建了第一个具有时空异质性的SAR水稻数据集，并提出了一个可解释的轻量级水稻面积提取模型——可解释曼巴UNet （XM-UNet）。该数据集基于2023年Sentinel-1的多时段数据，涵盖了来自美国、肯尼亚和越南的不同水稻样本。基于选择性状态空间模型设计了一个时间特征重要性解释器（TFI-Explainer），以增强对水稻时间异质性的适应性和模型的可解释性。该解释器与DL模型相结合，提供了对SAR时间特征重要性的解释，并促进了关键的时相筛选。为了克服水稻的空间异质性，设计了一种结合CNN和自注意机制的注意力沙漏层（Attention Sandglass Layer， ASL），增强了局部空间特征提取能力。此外，并行视觉状态空间层（PVSSL）利用2D-Selective-Scan （SS2D）交叉扫描多方向捕获水稻的全局空间特征，通过并行化显著降低了计算复杂度。实验结果表明，XM-UNet能较好地适应全球水稻的时空异质性，OA和f1得分分别为94.26%和90.73%。该模型非常轻巧，只有0.190 M参数和0.279 GFLOPs。Mamba的选择性扫描有助于特征筛选，它与CNN的结合有效地平衡了水稻的局部和全局空间特征。可解释性实验证明，该模型所提供的对时间特征重要性的解释对于指导水稻分布图和填补相关领域的空白至关重要。该代码可在https://github.com/SAR-RICE/XM-UNet中获得。

{"title":"Large-scale rice mapping under spatiotemporal heterogeneity using multi-temporal SAR images and explainable deep learning","authors":"Ji Ge , Hong Zhang , Lijun Zuo , Lu Xu , Jingling Jiang , Mingyang Song , Yinhaibin Ding , Yazhe Xie , Fan Wu , Chao Wang , Wenjiang Huang","doi":"10.1016/j.isprsjprs.2024.12.021","DOIUrl":"10.1016/j.isprsjprs.2024.12.021","url":null,"abstract":"<div><div>Timely and accurate mapping of rice cultivation distribution is crucial for ensuring global food security and achieving SDG2. From a global perspective, rice areas display high heterogeneity in spatial pattern and SAR time-series characteristics, posing substantial challenges to deep learning (DL) models’ performance, efficiency, and transferability. Moreover, due to their “black box” nature, DL often lack interpretability and credibility. To address these challenges, this paper constructs the first SAR rice dataset with spatiotemporal heterogeneity and proposes an explainable, lightweight model for rice area extraction, the eXplainable Mamba UNet (XM-UNet). The dataset is based on the 2023 multi-temporal Sentinel-1 data, covering diverse rice samples from the United States, Kenya, and Vietnam. A Temporal Feature Importance Explainer (TFI-Explainer) based on the Selective State Space Model is designed to enhance adaptability to the temporal heterogeneity of rice and the model’s interpretability. This explainer, coupled with the DL model, provides interpretations of the importance of SAR temporal features and facilitates crucial time phase screening. To overcome the spatial heterogeneity of rice, an Attention Sandglass Layer (ASL) combining CNN and self-attention mechanisms is designed to enhance the local spatial feature extraction capabilities. Additionally, the Parallel Visual State Space Layer (PVSSL) utilizes 2D-Selective-Scan (SS2D) cross-scanning to capture the global spatial features of rice multi-directionally, significantly reducing computational complexity through parallelization. Experimental results demonstrate that the XM-UNet adapts well to the spatiotemporal heterogeneity of rice globally, with OA and F1-score of 94.26 % and 90.73 %, respectively. The model is extremely lightweight, with only 0.190 M parameters and 0.279 GFLOPs. Mamba’s selective scanning facilitates feature screening, and its integration with CNN effectively balances rice’s local and global spatial characteristics. The interpretability experiments prove that the explanations of the importance of the temporal features provided by the model are crucial for guiding rice distribution mapping and filling a gap in the related field. The code is available in <span><span>https://github.com/SAR-RICE/XM-UNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 395-412"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KIBS: 3D detection of planar roof sections from a single satellite image KIBS：从单张卫星图像中三维检测平面屋顶剖面

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing

Pub Date : 2025-02-01 DOI: 10.1016/j.isprsjprs.2024.11.014

Johann Lussange , Mulin Yu , Yuliya Tarabalka , Florent Lafarge

Reconstructing urban areas in 3D from satellite raster images has been a long-standing problem for both academical and industrial research. While automatic methods achieving this objective at a Level Of Detail (LOD) 1 are mostly efficient today, producing LOD2 models is still a scientific challenge. In particular, the quality and resolution of satellite data is too low to infer accurately the planar roof sections in 3D by using traditional plane detection algorithms. Existing methods rely upon the exploitation of both strong urban priors that reduce their applicability to a variety of environments and multi-modal data, including some derived 3D products such as Digital Surface Models. In this work, we address this issue with KIBS (Keypoints Inference By Segmentation), a method that detects planar roof sections in 3D from a single-view satellite image. By exploiting large-scale LOD2 databases produced by human operators with efficient neural architectures, we manage to both segment roof sections in images and extract keypoints enclosing these sections in 3D to form 3D-polygons with a low-complexity. The output set of 3D-polygons can be used to reconstruct LOD2 models of buildings when combined with a plane assembly method. While conceptually simple, our method manages to capture roof sections as 3D-polygons with a good accuracy, from a single satellite image only by learning indirect 3D information contained in the image, in particular from the view inclination, the distortion of facades, the building shadows, roof peak and ridge perspective. We demonstrate the potential of KIBS by reconstructing large urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of approximately 80%, and an altimetric error of the reconstructed LOD2 model of less than to 2 meters.

从卫星光栅图像中重建城市区域的三维图像一直是学术界和工业界研究的一个长期问题。虽然在细节级别（LOD） 1上实现这一目标的自动方法目前大多是有效的，但生成LOD2模型仍然是一个科学挑战。特别是卫星数据的质量和分辨率都很低，无法通过传统的平面检测算法准确地推断出三维平面屋顶剖面。现有的方法依赖于利用强大的城市先验，这降低了它们对各种环境和多模态数据的适用性，包括一些衍生的3D产品，如数字表面模型。在这项工作中，我们用KIBS （Keypoints Inference By Segmentation）解决了这个问题，这是一种从单视图卫星图像中检测3D平面屋顶截面的方法。通过利用人类操作员产生的大规模LOD2数据库，利用高效的神经结构，我们成功地在图像中分割屋顶截面，并在3D中提取包围这些截面的关键点，从而形成低复杂度的3D多边形。当与平面装配方法相结合时，3d多边形的输出集可用于重建建筑物的LOD2模型。虽然概念上很简单，但我们的方法仅通过学习图像中包含的间接3D信息，特别是视图倾角、立面变形、建筑阴影、屋顶峰值和山脊视角，就能从单张卫星图像中以良好的精度捕获屋顶剖面的3D多边形。通过在几分钟内重建大型城市区域，我们展示了KIBS的潜力，单个屋顶部分的二维分割的Jaccard指数约为80%，重建的LOD2模型的高差小于2米。

{"title":"KIBS: 3D detection of planar roof sections from a single satellite image","authors":"Johann Lussange , Mulin Yu , Yuliya Tarabalka , Florent Lafarge","doi":"10.1016/j.isprsjprs.2024.11.014","DOIUrl":"10.1016/j.isprsjprs.2024.11.014","url":null,"abstract":"<div><div>Reconstructing urban areas in 3D from satellite raster images has been a long-standing problem for both academical and industrial research. While automatic methods achieving this objective at a Level Of Detail (LOD) 1 are mostly efficient today, producing LOD2 models is still a scientific challenge. In particular, the quality and resolution of satellite data is too low to infer accurately the planar roof sections in 3D by using traditional plane detection algorithms. Existing methods rely upon the exploitation of both strong urban priors that reduce their applicability to a variety of environments and multi-modal data, including some derived 3D products such as Digital Surface Models. In this work, we address this issue with KIBS (<em>Keypoints Inference By Segmentation</em>), a method that detects planar roof sections in 3D from a single-view satellite image. By exploiting large-scale LOD2 databases produced by human operators with efficient neural architectures, we manage to both segment roof sections in images and extract keypoints enclosing these sections in 3D to form 3D-polygons with a low-complexity. The output set of 3D-polygons can be used to reconstruct LOD2 models of buildings when combined with a plane assembly method. While conceptually simple, our method manages to capture roof sections as 3D-polygons with a good accuracy, from a single satellite image only by learning indirect 3D information contained in the image, in particular from the view inclination, the distortion of facades, the building shadows, roof peak and ridge perspective. We demonstrate the potential of KIBS by reconstructing large urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of approximately 80%, and an altimetric error of the reconstructed LOD2 model of less than to 2 meters.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"220 ","pages":"Pages 207-216"},"PeriodicalIF":10.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0