Pub Date : 2026-01-20DOI: 10.1109/JSTARS.2026.3655583
Yujia Fu;Mingyang Wang;Danfeng Hong;Gemine Vivone
Self-supervisedlearning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.
{"title":"Dual-Domain Masked Representation Learning for Semantic Segmentation of Remote Sensing Images","authors":"Yujia Fu;Mingyang Wang;Danfeng Hong;Gemine Vivone","doi":"10.1109/JSTARS.2026.3655583","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655583","url":null,"abstract":"Self-supervisedlearning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5507-5519"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1109/JSTARS.2026.3655691
Yuan Yuan;Junhan Zhou;Lei Lin;Ying Yu;Qingshan Liu
Optical satellite time series data play a crucial role in monitoring vegetation dynamics and land surface changes. However, persistent cloud cover often leads to missing data, particularly during critical phenological stages, which significantly diminishes data quality and hinders downstream applications. To address this issue, we present conditional optical-SAR multitemporal diffusion (CosmDiff), a novel framework for reconstructing optical satellite time series by integrating multimodal, multitemporal optical and synthetic aperture radar (SAR) data using conditional diffusion models. In CosmDiff, the reconstruction task is formulated as a multivariate time series imputation problem, where missing values are modeled as conditionally dependent on both cloudfree optical observations and synergic SAR time series. The framework incorporates a Transformer-based network within the diffusion process, introducing a novel dimensional decomposition attention mechanism that fuses optical-SAR time series across both temporal and feature dimensions. This mechanism enables the dynamic extraction of essential and complementary features from both modalities. In addition, linearly interpolated optical time series are used as auxiliary inputs to further guide the imputation process. Experimental results on Sentinel-1/-2 datasets demonstrate that CosmDiff consistently outperforms both traditional interpolation methods and advanced deep learning approaches, achieving a 3.8% reduction in mean absolute error and a 6.8% improvement in spectral angle mapper compared to competing methods. Furthermore, CosmDiff provides comprehensive uncertainty estimates for its predictions, which are particularly valuable for decision-making applications.
{"title":"CosmDiff: Integrating Multitemporal Optical-SAR Data With Conditional Diffusion Models for Optical Satellite Time Series Reconstruction","authors":"Yuan Yuan;Junhan Zhou;Lei Lin;Ying Yu;Qingshan Liu","doi":"10.1109/JSTARS.2026.3655691","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655691","url":null,"abstract":"Optical satellite time series data play a crucial role in monitoring vegetation dynamics and land surface changes. However, persistent cloud cover often leads to missing data, particularly during critical phenological stages, which significantly diminishes data quality and hinders downstream applications. To address this issue, we present conditional optical-SAR multitemporal diffusion (CosmDiff), a novel framework for reconstructing optical satellite time series by integrating multimodal, multitemporal optical and synthetic aperture radar (SAR) data using conditional diffusion models. In CosmDiff, the reconstruction task is formulated as a multivariate time series imputation problem, where missing values are modeled as conditionally dependent on both cloudfree optical observations and synergic SAR time series. The framework incorporates a Transformer-based network within the diffusion process, introducing a novel dimensional decomposition attention mechanism that fuses optical-SAR time series across both temporal and feature dimensions. This mechanism enables the dynamic extraction of essential and complementary features from both modalities. In addition, linearly interpolated optical time series are used as auxiliary inputs to further guide the imputation process. Experimental results on Sentinel-1/-2 datasets demonstrate that CosmDiff consistently outperforms both traditional interpolation methods and advanced deep learning approaches, achieving a 3.8% reduction in mean absolute error and a 6.8% improvement in spectral angle mapper compared to competing methods. Furthermore, CosmDiff provides comprehensive uncertainty estimates for its predictions, which are particularly valuable for decision-making applications.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5722-5740"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformer-based architectures have shown strong potential in hyperspectral unmixing due to their powerful modeling capabilities. However, most existing transformer-based methods still struggle to effectively capture and fuse spatial–spectral features, and their predominant reliance on reconstruction error further constrains overall unmixing performance. Moreover, they rarely account for the nonlinear correlations that inherently exist between the spatial and spectral domains. To address these challenges, we propose a sampling-based spatial–spectral transformer and generative adversarial network (SSST-GAN). The proposed model employs a dual-branch, sampling-based transformer encoder to independently extract spatial and spectral representations. Specifically, the spatial branch adopts a full-sampling multihead attention mechanism to capture rich contextual dependences among spatial pixels, while the spectral branch utilizes a sparse sampling strategy to efficiently distill key information from high-dimensional spectral data. A feature enhancement module is introduced to integrate and strengthen the complementary characteristics of spatial and spectral features. To further improve the modeling of complex nonlinear mixing patterns, we incorporate a generalized nonlinear fluctuation model at the decoding stage. In addition, SSST-GAN leverages a generative adversarial learning framework, in which a discriminator evaluates the authenticity of reconstructed pixels, thereby enhancing the fidelity of the unmixing results. Extensive experiments on both synthetic and real-world datasets demonstrate that SSST-GAN consistently outperforms several state-of-the-art methods in terms of unmixing accuracy.
{"title":"SSST-GAN: A Sampling-Based Spatial-Spectral Transformer and Generative Adversarial Network for Hyperspectral Unmixing","authors":"Yu Zhang;Jiageng Huang;Yefei Huang;Wei Gao;Jie Chen","doi":"10.1109/JSTARS.2026.3655512","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655512","url":null,"abstract":"Transformer-based architectures have shown strong potential in hyperspectral unmixing due to their powerful modeling capabilities. However, most existing transformer-based methods still struggle to effectively capture and fuse spatial–spectral features, and their predominant reliance on reconstruction error further constrains overall unmixing performance. Moreover, they rarely account for the nonlinear correlations that inherently exist between the spatial and spectral domains. To address these challenges, we propose a sampling-based spatial–spectral transformer and generative adversarial network (SSST-GAN). The proposed model employs a dual-branch, sampling-based transformer encoder to independently extract spatial and spectral representations. Specifically, the spatial branch adopts a full-sampling multihead attention mechanism to capture rich contextual dependences among spatial pixels, while the spectral branch utilizes a sparse sampling strategy to efficiently distill key information from high-dimensional spectral data. A feature enhancement module is introduced to integrate and strengthen the complementary characteristics of spatial and spectral features. To further improve the modeling of complex nonlinear mixing patterns, we incorporate a generalized nonlinear fluctuation model at the decoding stage. In addition, SSST-GAN leverages a generative adversarial learning framework, in which a discriminator evaluates the authenticity of reconstructed pixels, thereby enhancing the fidelity of the unmixing results. Extensive experiments on both synthetic and real-world datasets demonstrate that SSST-GAN consistently outperforms several state-of-the-art methods in terms of unmixing accuracy.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5741-5757"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11358397","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Infrared imaging plays a crucial role in applications, such as search-and-rescue operations and fire monitoring, due to its robustness under complex environmental conditions. Nevertheless, the inherent low spatial resolution of infrared cameras, and the complicated imaging degradation process, still constrains the quality of captured images, thereby posing challenges for downstream tasks. Existing infrared image super-resolution methods (e.g., diffusion-based methods) often neglect the unique modality characteristics of infrared images and fail to effectively introduce additional fine-grained information. To address these limitations, we propose a novel framework named Visible-light-guided infrared image super resolution with dual amplitude-phase optimization (vap-SR). By leveraging the powerful generative capability of conditional diffusion and fully exploiting the rich structural priors embedded in visible images, vap-SR effectively compensates for the deficiencies of infrared images in terms of details, thereby overcoming the inherent limitations in texture fidelity. Phase and amplitude losses are designed to preserve the physical characteristics of the infrared modality while effectively leveraging the structural information from visible-light images. Extensive experiments demonstrate that vap-SR consistently outperforms state-of-the-art methods in both reconstruction quality and downstream object detection task, validating its effectiveness for infrared super resolution.
{"title":"Visible-Light-Guided Infrared Image Super Resolution With Dual Amplitude-Phase Optimization","authors":"Qingwang Wang;Yuhang Wu;Pengcheng Jin;Yan Lin;Zhen Zhang;Tao Shen","doi":"10.1109/JSTARS.2026.3655485","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655485","url":null,"abstract":"Infrared imaging plays a crucial role in applications, such as search-and-rescue operations and fire monitoring, due to its robustness under complex environmental conditions. Nevertheless, the inherent low spatial resolution of infrared cameras, and the complicated imaging degradation process, still constrains the quality of captured images, thereby posing challenges for downstream tasks. Existing infrared image super-resolution methods (e.g., diffusion-based methods) often neglect the unique modality characteristics of infrared images and fail to effectively introduce additional fine-grained information. To address these limitations, we propose a novel framework named Visible-light-guided infrared image super resolution with dual amplitude-phase optimization (vap-SR). By leveraging the powerful generative capability of conditional diffusion and fully exploiting the rich structural priors embedded in visible images, vap-SR effectively compensates for the deficiencies of infrared images in terms of details, thereby overcoming the inherent limitations in texture fidelity. Phase and amplitude losses are designed to preserve the physical characteristics of the infrared modality while effectively leveraging the structural information from visible-light images. Extensive experiments demonstrate that vap-SR consistently outperforms state-of-the-art methods in both reconstruction quality and downstream object detection task, validating its effectiveness for infrared super resolution.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5774-5784"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11358958","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1109/JSTARS.2026.3655359
Jianing Shao;Yanlei Du;Xiaofeng Yang;Longxiang Linghu;Jinsong Chong;Jian Yang
This study numerically investigates the spatial ergodicity of Doppler characteristics in polarimetric ocean radar scattering. The full Apel wave spectrum is employed to generate 2-D time-varying sea surfaces that involve all dominant large-scale gravity waves and small-scale capillary waves. By solving the radar scattering from time-varying ocean surfaces with various illumination sizes using the second-order small-slope approximation (SSA-2) model, the Doppler spectra, along with the Doppler shift and width, are thus computed and analyzed. The numerical simulations are conducted at L-band for three typical fully developed sea states. A Doppler shift error threshold is defined based on the accuracy requirements of sea surface current retrieval, and the spatial ergodicity of Doppler shift is evaluated quantitatively. Simulation results indicate that under co-polarization, the Doppler shift manifests spatial ergodicity when the sea surface size illuminated by radar is no less than one-quarter of the largest gravity wave wavelength at the corresponding sea state. For cross-polarization, the spatial ergodicity of the Doppler shift is significantly reduced and is observed only when the illumination size exceeds about one-half of the largest gravity wave wavelength. The results also indicate that wind direction has a limited effect on the spatial ergodicity of the Doppler shift.
{"title":"Spatial Ergodicity of Doppler Characteristics in Polarimetric Ocean Radar Scattering: A Numerical Study","authors":"Jianing Shao;Yanlei Du;Xiaofeng Yang;Longxiang Linghu;Jinsong Chong;Jian Yang","doi":"10.1109/JSTARS.2026.3655359","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655359","url":null,"abstract":"This study numerically investigates the spatial ergodicity of Doppler characteristics in polarimetric ocean radar scattering. The full Apel wave spectrum is employed to generate 2-D time-varying sea surfaces that involve all dominant large-scale gravity waves and small-scale capillary waves. By solving the radar scattering from time-varying ocean surfaces with various illumination sizes using the second-order small-slope approximation (SSA-2) model, the Doppler spectra, along with the Doppler shift and width, are thus computed and analyzed. The numerical simulations are conducted at L-band for three typical fully developed sea states. A Doppler shift error threshold is defined based on the accuracy requirements of sea surface current retrieval, and the spatial ergodicity of Doppler shift is evaluated quantitatively. Simulation results indicate that under co-polarization, the Doppler shift manifests spatial ergodicity when the sea surface size illuminated by radar is no less than one-quarter of the largest gravity wave wavelength at the corresponding sea state. For cross-polarization, the spatial ergodicity of the Doppler shift is significantly reduced and is observed only when the illumination size exceeds about one-half of the largest gravity wave wavelength. The results also indicate that wind direction has a limited effect on the spatial ergodicity of the Doppler shift.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5493-5506"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11358708","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1109/JSTARS.2026.3655350
Wenjing Li;Libin Du;Xinglei Zhao
Accurate water-land classification is fundamental for topographic mapping and coastal zone monitoring based on airborne LiDAR bathymetry (ALB). However, due to the limited information content and feature ambiguity of one-dimensional (1-D) waveform signals, accurate classification from single-wavelength ALB data remains challenging. To address this issue, a dual-branch multimodal fusion network (CRMF-Net) is proposed to improve both classification accuracy and robustness. The proposed network consists of a convolutional neural network (CNN) branch and a convolutional block attention module optimized residual neural network branch, which are designed to capture complementary temporal and spatial features, respectively. The 1-D green waveform is converted into a 2-D time-frequency representation through the continuous wavelet transform, thereby increasing the dimensions and quantity of waveform features. By jointly exploiting complementary information from waveform signals and their corresponding time–frequency representations, the proposed method enables more effective feature representation without relying on extensive handcrafted analysis. Experiments conducted on CZMIL datasets from Qinshan Island demonstrate that CRMF-Net achieves an overall accuracy of 97.33% with a kappa coefficient of 0.9168, outperforming traditional methods, such as fuzzy C-means, support vector machine, and the one-dimensional convolutional neural network approach. These results indicate that the proposed method provides a promising solution for fully automated processing of single-wavelength ALB data.
{"title":"CRMF-Net: A Multimodal Fusion Network for Water–Land Classification From Single-Wavelength Bathymetric LiDAR","authors":"Wenjing Li;Libin Du;Xinglei Zhao","doi":"10.1109/JSTARS.2026.3655350","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655350","url":null,"abstract":"Accurate water-land classification is fundamental for topographic mapping and coastal zone monitoring based on airborne LiDAR bathymetry (ALB). However, due to the limited information content and feature ambiguity of one-dimensional (1-D) waveform signals, accurate classification from single-wavelength ALB data remains challenging. To address this issue, a dual-branch multimodal fusion network (CRMF-Net) is proposed to improve both classification accuracy and robustness. The proposed network consists of a convolutional neural network (CNN) branch and a convolutional block attention module optimized residual neural network branch, which are designed to capture complementary temporal and spatial features, respectively. The 1-D green waveform is converted into a 2-D time-frequency representation through the continuous wavelet transform, thereby increasing the dimensions and quantity of waveform features. By jointly exploiting complementary information from waveform signals and their corresponding time–frequency representations, the proposed method enables more effective feature representation without relying on extensive handcrafted analysis. Experiments conducted on CZMIL datasets from Qinshan Island demonstrate that CRMF-Net achieves an overall accuracy of 97.33% with a kappa coefficient of 0.9168, outperforming traditional methods, such as fuzzy C-means, support vector machine, and the one-dimensional convolutional neural network approach. These results indicate that the proposed method provides a promising solution for fully automated processing of single-wavelength ALB data.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5804-5813"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11358398","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1109/JSTARS.2026.3655144
Marta Alonso Tubía;Miguel Baena Botana;An Vo Quang;Ana Burgin;Oliva Garcia Cantú-Ros
Dynamic population mapping has become crucial for capturing real-time human movement and behavior, beyond traditional population mapping relying on census data. Differentiating indoor and outdoor activity enhances accuracy for smart city planning, emergency response, public health, or emerging technologies like Innovative Air Mobility, where pedestrian data informs safer, less disruptive flight planning. Data passively collected from mobile networks have proven to be highly effective in accurately capturing population presence and mobility patterns. By enhancing this rich data source with GPS data for spatial accuracy and validating the results with satellite imagery of detected pedestrians, we provide a procedure for indoor and outdoor population detection. The results show agreement between both methodologies. Despite some limitations related to GPS data biases and pedestrian detection issues caused by urban furniture and shadows, the procedure demonstrates strong potential to capture people’s movements, which could ultimately enable near real-time monitoring of population presence on the streets.
{"title":"Toward Outdoor Population Presence Monitoring With Mobile Network Data and Satellite Imagery","authors":"Marta Alonso Tubía;Miguel Baena Botana;An Vo Quang;Ana Burgin;Oliva Garcia Cantú-Ros","doi":"10.1109/JSTARS.2026.3655144","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655144","url":null,"abstract":"Dynamic population mapping has become crucial for capturing real-time human movement and behavior, beyond traditional population mapping relying on census data. Differentiating indoor and outdoor activity enhances accuracy for smart city planning, emergency response, public health, or emerging technologies like Innovative Air Mobility, where pedestrian data informs safer, less disruptive flight planning. Data passively collected from mobile networks have proven to be highly effective in accurately capturing population presence and mobility patterns. By enhancing this rich data source with GPS data for spatial accuracy and validating the results with satellite imagery of detected pedestrians, we provide a procedure for indoor and outdoor population detection. The results show agreement between both methodologies. Despite some limitations related to GPS data biases and pedestrian detection issues caused by urban furniture and shadows, the procedure demonstrates strong potential to capture people’s movements, which could ultimately enable near real-time monitoring of population presence on the streets.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5834-5852"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11358662","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1109/JSTARS.2026.3654602
Ming Tong;Shenghua Fan;Jiu Jiang;Hezhi Sun;Jisan Yang;Chu He
Recently, detectors based on deep learning have boosted the state-of-the-art of application on ship detection in synthetic aperture radar (SAR) images. However, constructing discriminative feature from scattering of background and distinguishing contour of ship precisely still present challenging subject to the inherent scattering mechanism of SAR. In this article, a dual-branch detection framework with perception of scattering characteristic and geometric contour is introduced to deal with the problem. First, a scattering characteristic perception branch is proposed to fit the scattering distribution of SAR ship through conditional diffusion model, which introduces learnable scattering feature. Second, a convex contour perception branch is designed as two-stage coarse-to-fine pipeline to delimit the irregular boundary of ship by learning scattering key points. Finally, a cross-token integration module following Bayesian framework is introduced to couple features of scattering and texture adaptively to learn construction of discriminative feature. Furthermore, comprehensive experiments on three authoritative SAR datasets for oriented ship detection demonstrate the effectiveness of proposed method.
{"title":"Dual-Perception Detector for Ship Detection in SAR Images","authors":"Ming Tong;Shenghua Fan;Jiu Jiang;Hezhi Sun;Jisan Yang;Chu He","doi":"10.1109/JSTARS.2026.3654602","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3654602","url":null,"abstract":"Recently, detectors based on deep learning have boosted the state-of-the-art of application on ship detection in synthetic aperture radar (SAR) images. However, constructing discriminative feature from scattering of background and distinguishing contour of ship precisely still present challenging subject to the inherent scattering mechanism of SAR. In this article, a dual-branch detection framework with perception of scattering characteristic and geometric contour is introduced to deal with the problem. First, a scattering characteristic perception branch is proposed to fit the scattering distribution of SAR ship through conditional diffusion model, which introduces learnable scattering feature. Second, a convex contour perception branch is designed as two-stage coarse-to-fine pipeline to delimit the irregular boundary of ship by learning scattering key points. Finally, a cross-token integration module following Bayesian framework is introduced to couple features of scattering and texture adaptively to learn construction of discriminative feature. Furthermore, comprehensive experiments on three authoritative SAR datasets for oriented ship detection demonstrate the effectiveness of proposed method.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"4790-4808"},"PeriodicalIF":5.3,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11355870","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/JSTARS.2026.3654241
Yuan Li;Tianzhu Zhang;Ziyi Xiong;Junying Lv;Yinning Pang
Detecting three-dimensional (3-D) windows is vital for creating semantic building models with high level of detail, furnishing smart city and digital twin programs. Existing studies on window extraction using street imagery or laser scanning data often rely on limited types of features, resulting in compromised accuracy and completeness due to shadows and geometric decorations caused by curtains, balconies, plants, and other objects. To enhance the effectiveness and robustness of building window extraction in 3-D, this article proposes an automatic method that leverages synergistic information from multiview-stereo (MVS) point clouds, through an adaptive divide-and-combine pipeline. Color information inherited from the imagery serves as a main clue to acquire the point clouds of individual building façades that may be coplanar and connected. The geometric information associated with normal vectors is then combined with color, to adaptively divide individual building façade into an irregular grid that conforms to the window edges. Subsequently, HSV color and depth distances within each grid cell are computed, and the grid cells are encoded to quantify the global arrangement features of windows. Finally, the multitype features are fused in an integer programming model, by solving which the optimal combination of grid cells corresponding to windows is obtained. Benefitting from the informative MVS point clouds and the fusion of multitype features, our method is able to directly produce 3-D models with high regularity for buildings with different appearances. Experimental results demonstrate that the proposed method is effective in 3-D window extraction while overcoming variations in façade appearances caused by foreign objects and missing data, with a high point-wise precision of 92.7%, recall of 77.09%, IoU of 71.95%, and F1-score of 83.42%. The results also exhibit a high level of integrity, with the accuracy of correctly extracted windows reaching 89.81%. In the future, we will focus on the development of a more universal façade dividing method to deal with even more complicated windows.
{"title":"Automated Extraction of 3-D Windows From MVS Point Clouds by Comprehensive Fusion of Multitype Features","authors":"Yuan Li;Tianzhu Zhang;Ziyi Xiong;Junying Lv;Yinning Pang","doi":"10.1109/JSTARS.2026.3654241","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3654241","url":null,"abstract":"Detecting three-dimensional (3-D) windows is vital for creating semantic building models with high level of detail, furnishing smart city and digital twin programs. Existing studies on window extraction using street imagery or laser scanning data often rely on limited types of features, resulting in compromised accuracy and completeness due to shadows and geometric decorations caused by curtains, balconies, plants, and other objects. To enhance the effectiveness and robustness of building window extraction in 3-D, this article proposes an automatic method that leverages synergistic information from multiview-stereo (MVS) point clouds, through an adaptive divide-and-combine pipeline. Color information inherited from the imagery serves as a main clue to acquire the point clouds of individual building façades that may be coplanar and connected. The geometric information associated with normal vectors is then combined with color, to adaptively divide individual building façade into an irregular grid that conforms to the window edges. Subsequently, HSV color and depth distances within each grid cell are computed, and the grid cells are encoded to quantify the global arrangement features of windows. Finally, the multitype features are fused in an integer programming model, by solving which the optimal combination of grid cells corresponding to windows is obtained. Benefitting from the informative MVS point clouds and the fusion of multitype features, our method is able to directly produce 3-D models with high regularity for buildings with different appearances. Experimental results demonstrate that the proposed method is effective in 3-D window extraction while overcoming variations in façade appearances caused by foreign objects and missing data, with a high point-wise precision of 92.7%, recall of 77.09%, IoU of 71.95%, and F1-score of 83.42%. The results also exhibit a high level of integrity, with the accuracy of correctly extracted windows reaching 89.81%. In the future, we will focus on the development of a more universal façade dividing method to deal with even more complicated windows.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"4918-4934"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11353237","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/JSTARS.2026.3654195
Daniel Carcereri;Luca Dell’Amore;Stefano Tebaldini;Paola Rizzoli
The increasing use of artificial intelligence (AI) models in Earth Observation (EO) applications, such as forest height estimation, has led to a growing need for explainable AI (XAI) methods. Despite their high accuracy, AI models are often criticized for their “black-box” nature, making it difficult to understand the inner decision-making process. In this study, we propose a multifaceted approach to XAI for a convolutional neural network (CNN)-based model that estimates forest height from TanDEM-X single-pass InSAR data. By combining domain knowledge, saliency maps, and feature importance analysis through exhaustive model permutations, we provide a comprehensive investigation of the network working principles. Our results suggests that the proposed model is implicitly capable of recognizing and compensating for the SAR acquisition geometry-related distortions. We find that the mean phase center height and its local variability represents the most informative predictor. We also find evidence that the interferometric coherence and the backscatter maps capture complementary but equally relevant views of the vegetation. This work contributes to advance the understanding of the model’s inner workings, and targets the development of more transparent and trustworthy AI for EO applications, ultimately leading to improved accuracy and reliability in the estimation of forest parameters.
{"title":"Insights on the Working Principles of a CNN for Forest Height Regression From Single-Pass InSAR Data","authors":"Daniel Carcereri;Luca Dell’Amore;Stefano Tebaldini;Paola Rizzoli","doi":"10.1109/JSTARS.2026.3654195","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3654195","url":null,"abstract":"The increasing use of artificial intelligence (AI) models in Earth Observation (EO) applications, such as forest height estimation, has led to a growing need for explainable AI (XAI) methods. Despite their high accuracy, AI models are often criticized for their “black-box” nature, making it difficult to understand the inner decision-making process. In this study, we propose a multifaceted approach to XAI for a convolutional neural network (CNN)-based model that estimates forest height from TanDEM-X single-pass InSAR data. By combining domain knowledge, saliency maps, and feature importance analysis through exhaustive model permutations, we provide a comprehensive investigation of the network working principles. Our results suggests that the proposed model is implicitly capable of recognizing and compensating for the SAR acquisition geometry-related distortions. We find that the mean phase center height and its local variability represents the most informative predictor. We also find evidence that the interferometric coherence and the backscatter maps capture complementary but equally relevant views of the vegetation. This work contributes to advance the understanding of the model’s inner workings, and targets the development of more transparent and trustworthy AI for EO applications, ultimately leading to improved accuracy and reliability in the estimation of forest parameters.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"4809-4824"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11352840","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}