Pub Date : 2026-01-23DOI: 10.1109/JSTARS.2026.3657496
Yan Mou;Zheng Chen;Jinjiang Li
Hyperspectral and multispectral image fusion aims to integrate the complementary characteristics of both modalities to reconstruct high spatial-resolution hyperspectral images (HR-HSI). In recent years, joint modeling in the spatial and frequency domains has become an effective strategy for enhancing fusion performance. However, existing methods still exhibit limitations in extracting spatial–frequency features and achieving complete and efficient integration of complementary information, which often leads to fused images that fail to maintain spatial–spectral consistency. To overcome these challenges, this article proposes a spatial-guided frequency compensation polarized attention fusion network (PASG-Net), achieving HR-HSI reconstruction by integrating spatial-domain and frequency-domain features. Specifically, the grouped spatial feature extraction module employs grouped dense residual learning to capture local features in the spatial domain. The spatial-guided frequency compensation module is designed based on the observation of “phase similarity and magnitude complementarity,” utilizing spatial priors to generate dynamic weights, achieving magnitude fusion and phase fine-tuning to capture comprehensive global frequency-domain features. The symmetric polarized cross-attention module introduces polarized linear cross-attention; explicit positive–negative polarity modeling is added to linear attention, effectively integrating complementary information from both domains while maintaining low computational complexity. Extensive experiments demonstrate that the proposed PASG-Net outperforms the current State-of-the-Art methods.
{"title":"PASG-Net:Spatial-Guided Frequency Compensation Polarized Attention Fusion Network for Hyperspectral and Multispectral Image Fusion","authors":"Yan Mou;Zheng Chen;Jinjiang Li","doi":"10.1109/JSTARS.2026.3657496","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3657496","url":null,"abstract":"Hyperspectral and multispectral image fusion aims to integrate the complementary characteristics of both modalities to reconstruct high spatial-resolution hyperspectral images (HR-HSI). In recent years, joint modeling in the spatial and frequency domains has become an effective strategy for enhancing fusion performance. However, existing methods still exhibit limitations in extracting spatial–frequency features and achieving complete and efficient integration of complementary information, which often leads to fused images that fail to maintain spatial–spectral consistency. To overcome these challenges, this article proposes a spatial-guided frequency compensation polarized attention fusion network (PASG-Net), achieving HR-HSI reconstruction by integrating spatial-domain and frequency-domain features. Specifically, the grouped spatial feature extraction module employs grouped dense residual learning to capture local features in the spatial domain. The spatial-guided frequency compensation module is designed based on the observation of “phase similarity and magnitude complementarity,” utilizing spatial priors to generate dynamic weights, achieving magnitude fusion and phase fine-tuning to capture comprehensive global frequency-domain features. The symmetric polarized cross-attention module introduces polarized linear cross-attention; explicit positive–negative polarity modeling is added to linear attention, effectively integrating complementary information from both domains while maintaining low computational complexity. Extensive experiments demonstrate that the proposed PASG-Net outperforms the current State-of-the-Art methods.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5695-5709"},"PeriodicalIF":5.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11362981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the vast expanses of bare soil and sparse grassland in the central and western Tibetan Plateau (CWTP), the duration of precipitation serves as a critical control on key hydrological and geomorphological responses, such as runoff generation, soil erosion, and sediment transport. This study analyzes the spatial distribution of precipitation events with different durations (short-duration, medium-duration, and long-duration) across the Tibetan Plateau (TP). To explore the differences in precipitation duration between the central-western and eastern TP, we utilize hourly precipitation data from a newly established cross-sectional rainfall observation network (53 gauges) on the CWTP and the China Meteorological Administration observation (145 gauges) on the eastern TP for comparison. The main conclusions are as follows: First, short-duration (1–3 h) precipitation events contribute more than 50% of the total rainfall in the CWTP. In contrast, short-duration events contribute less than 30% in the eastern TP, where medium- and long-duration precipitation events dominate. Second, the reanalysis data ECMWF reanalysis 5 and the high-resolution atmospheric simulation data high asia refined analysis version 2 tend to systematically underestimate (overestimate) the contribution of short-duration (long-duration) precipitation events. Specifically, they exhibit mean biases of 23% and 19% for short-duration precipitation, and 34% and 20% for long-duration precipitation, respectively. Third, the satellite remote sensing precipitation data integrated multisatellite retrievals perform well in estimating the contribution of precipitation events, with biases mostly within 30% . By comparing the original and gauge-calibrated satellite datasets, the results show that calibration with coarse temporal resolutions (daily or monthly) does not necessarily improve the identification of short-duration precipitation events. Our results not only enhance the understanding of precipitation characteristics and processes over the TP, but also provide valuable guidance for hydrological modeling and the evaluation and improvement of satellite-based precipitation data.
{"title":"Characterizing Short-Duration Rainfall Events on the Tibetan Plateau Based on Ground Observation, Satellite Remote Sensing, and Reanalysis Data","authors":"Xiaoyan Ling;Yingying Chen;Kun Yang;Run Han;Lazhu>;Xu Zhou;Xin Li;Changhui Zhan;Yaozhi Jiang;Jiaxin Tian;Yan Wang;Ming Chen","doi":"10.1109/JSTARS.2026.3656905","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656905","url":null,"abstract":"Given the vast expanses of bare soil and sparse grassland in the central and western Tibetan Plateau (CWTP), the duration of precipitation serves as a critical control on key hydrological and geomorphological responses, such as runoff generation, soil erosion, and sediment transport. This study analyzes the spatial distribution of precipitation events with different durations (short-duration, medium-duration, and long-duration) across the Tibetan Plateau (TP). To explore the differences in precipitation duration between the central-western and eastern TP, we utilize hourly precipitation data from a newly established cross-sectional rainfall observation network (53 gauges) on the CWTP and the China Meteorological Administration observation (145 gauges) on the eastern TP for comparison. The main conclusions are as follows: First, short-duration (1–3 h) precipitation events contribute more than 50% of the total rainfall in the CWTP. In contrast, short-duration events contribute less than 30% in the eastern TP, where medium- and long-duration precipitation events dominate. Second, the reanalysis data ECMWF reanalysis 5 and the high-resolution atmospheric simulation data high asia refined analysis version 2 tend to systematically underestimate (overestimate) the contribution of short-duration (long-duration) precipitation events. Specifically, they exhibit mean biases of 23% and 19% for short-duration precipitation, and 34% and 20% for long-duration precipitation, respectively. Third, the satellite remote sensing precipitation data integrated multisatellite retrievals perform well in estimating the contribution of precipitation events, with biases mostly within 30% . By comparing the original and gauge-calibrated satellite datasets, the results show that calibration with coarse temporal resolutions (daily or monthly) does not necessarily improve the identification of short-duration precipitation events. Our results not only enhance the understanding of precipitation characteristics and processes over the TP, but also provide valuable guidance for hydrological modeling and the evaluation and improvement of satellite-based precipitation data.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5680-5694"},"PeriodicalIF":5.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11361388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geo-foundational models (GFMs) enable fast and reliable extraction of spatiotemporal information from satellite imagery, improving flood inundation mapping by leveraging location and time embeddings. Despite their potential, it remains unclear whether GFMs outperform traditional models such as U-Net. A systematic comparison across sensors and data availability scenarios is still lacking, which is essential to guide end-users in model selection. To address this, we evaluate three GFMs—Prithvi 2.0, Clay V1.5, dynamic one-for-all (DOFA)—and UViT (a Prithvi variant), against TransNorm, U-Net, DeepLabv3+, and Attention U-Net using PlanetScope (PS), Sentinel-1, and Sentinel-2. We observe competitive performance among all GFMs, with 2% –5% variation between the best and worst models across sensors. Clay outperforms others on PS (0.79 mIoU) and Sentinel-2 (0.72), while Prithvi leads on Sentinel-1 (0.57). In leave-one-region-out cross-validation across five regions, Clay shows slightly better performance across all sensors, with mIoU scores of 0.72, 0.66, and 0.51 for PanetScope, Sentinel-2, and Sentinel-1 respectively, compared to Prithvi (0.70, 0.64, 0.49) and DOFA (0.67, 0.64, 0.49). Across all 19 sites, cross-validation reveals a 4% improvement by Clay over U-Net. Visual inspection highlights Clay’s superior ability to retain fine details. Few-shot experiments show Clay achieves 0.64 mIoU on PS with just five training images, outperforming Prithvi (0.24) and DOFA (0.35). In terms of computational time, Clay is a better choice due to its smaller model size (26M parameters), making it approximately 3× faster than Prithvi (650M) and 2× faster than DOFA (410M). Our results suggest GFMs offer small to moderate improvements in flood mapping accuracy at lower computational cost and labeling effort.
{"title":"Assessing Geo-Foundational Models for Flood Inundation Mapping: Benchmarking Models for Sentinel-1, Sentinel-2, and Planetscope","authors":"Saurabh Kaushik;Lalit Maurya;Beth Tellman;ZhiJie Zhang","doi":"10.1109/JSTARS.2026.3656855","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656855","url":null,"abstract":"Geo-foundational models (GFMs) enable fast and reliable extraction of spatiotemporal information from satellite imagery, improving flood inundation mapping by leveraging location and time embeddings. Despite their potential, it remains unclear whether GFMs outperform traditional models such as U-Net. A systematic comparison across sensors and data availability scenarios is still lacking, which is essential to guide end-users in model selection. To address this, we evaluate three GFMs—Prithvi 2.0, Clay V1.5, dynamic one-for-all (DOFA)—and UViT (a Prithvi variant), against TransNorm, U-Net, DeepLabv3+, and Attention U-Net using PlanetScope (PS), Sentinel-1, and Sentinel-2. We observe competitive performance among all GFMs, with 2% –5% variation between the best and worst models across sensors. Clay outperforms others on PS (0.79 mIoU) and Sentinel-2 (0.72), while Prithvi leads on Sentinel-1 (0.57). In leave-one-region-out cross-validation across five regions, Clay shows slightly better performance across all sensors, with mIoU scores of 0.72, 0.66, and 0.51 for PanetScope, Sentinel-2, and Sentinel-1 respectively, compared to Prithvi (0.70, 0.64, 0.49) and DOFA (0.67, 0.64, 0.49). Across all 19 sites, cross-validation reveals a 4% improvement by Clay over U-Net. Visual inspection highlights Clay’s superior ability to retain fine details. Few-shot experiments show Clay achieves 0.64 mIoU on PS with just five training images, outperforming Prithvi (0.24) and DOFA (0.35). In terms of computational time, Clay is a better choice due to its smaller model size (26M parameters), making it approximately 3× faster than Prithvi (650M) and 2× faster than DOFA (410M). Our results suggest GFMs offer small to moderate improvements in flood mapping accuracy at lower computational cost and labeling effort.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5649-5665"},"PeriodicalIF":5.3,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11362928","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1109/JSTARS.2026.3656654
Lina Zhang;Xin Gao;Hongyu Liang;Lei Zhang;Xinyou Song;Jie Chen;Zheng Zhang;Yong Yan;Jianwan Ji
As a major city in eastern China, Suzhou faces significant challenges in metro construction due to its soft soil conditions, as excavations can induce differential settlement in nearby buildings. To assess such impacts, this article employs decade-scale (2009–2019) multitemporal InSAR using high-resolution TerraSAR-X imagery. A customized processing chain—integrating Persistent Scatterer and Small Baseline approaches with optimized phase unwrapping and thermal dilation correction—was applied to extract time-series deformation along Suzhou's metro network. The results show overall stability in central zones, with negligible subsidence within the historic moat area, yet reveal several localized subsidence bowls exceeding 15 mm/year in the suburbs. Time-series analysis correlates settlement of high-rise buildings (> 14 mm/year) with surface loading and soil consolidation. By integrating InSAR-derived displacements with building safety standards and metro engineering thresholds, a risk evaluation framework was developed. Its application to 49 279 structures identified 4 severely at-risk buildings (0.008%), 101 moderately at-risk buildings (0.205%), and 172 mildly at-risk buildings (0.349%). This article establishes a practical risk-assessment protocol and provides the first large-scale visualization of subsidence hazards along Suzhou's metro lines, offering valuable guidance for metro planning, operational safety, and infrastructure preservation.
{"title":"Assessment of Building Subsidence Along Suzhou Metro Lines Using Decade-Scale Multitemporal InSAR","authors":"Lina Zhang;Xin Gao;Hongyu Liang;Lei Zhang;Xinyou Song;Jie Chen;Zheng Zhang;Yong Yan;Jianwan Ji","doi":"10.1109/JSTARS.2026.3656654","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656654","url":null,"abstract":"As a major city in eastern China, Suzhou faces significant challenges in metro construction due to its soft soil conditions, as excavations can induce differential settlement in nearby buildings. To assess such impacts, this article employs decade-scale (2009–2019) multitemporal InSAR using high-resolution TerraSAR-X imagery. A customized processing chain—integrating Persistent Scatterer and Small Baseline approaches with optimized phase unwrapping and thermal dilation correction—was applied to extract time-series deformation along Suzhou's metro network. The results show overall stability in central zones, with negligible subsidence within the historic moat area, yet reveal several localized subsidence bowls exceeding 15 mm/year in the suburbs. Time-series analysis correlates settlement of high-rise buildings (> 14 mm/year) with surface loading and soil consolidation. By integrating InSAR-derived displacements with building safety standards and metro engineering thresholds, a risk evaluation framework was developed. Its application to 49 279 structures identified 4 severely at-risk buildings (0.008%), 101 moderately at-risk buildings (0.205%), and 172 mildly at-risk buildings (0.349%). This article establishes a practical risk-assessment protocol and provides the first large-scale visualization of subsidence hazards along Suzhou's metro lines, offering valuable guidance for metro planning, operational safety, and infrastructure preservation.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5539-5551"},"PeriodicalIF":5.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11361027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1109/JSTARS.2026.3656940
Changle Li;Yuewei Wang;Feifei Zhang;Hongwei Zhang
Pixel-based landslide susceptibility assessment (LSA) is prone to boundary blurring, salt and pepper artifacts, and unstable generalization in topographically heterogeneous mountains. To address these issues, we propose an edge-aware superpixel graph framework with a dual-graph graph convolutional network. Terrain enhanced simple linear iterative clustering with watershed initialization is employed to generate edge-aware superpixels that better follow geomorphic and engineered boundaries, thereby reducing label noise and boundary leakage. A fused graph is then constructed by combining a spatial adjacency graph with a feature similarity graph based on $k$-nearest neighbors, enabling information propagation among both neighboring and environmentally similar superpixels. Learning under severe class imbalance is stabilized through focal loss with class weighting, spatially blocked data splits, probability calibration, and threshold scanning. Experiments conducted over a remote sensing strip study area, defined by the footprint of a Landsat 9 scene in Shanxi, China, and integrating multisource conditioning factors, show that the proposed method exceeds the mean performance of the compared methods by 11.6% in area under curve (AUC) and 18.1% in $F1$ under identical data and spatial splits. Gains are most pronounced along narrow valley flanks, ridge gully transition zones, and engineered cut slopes, whereas changes are modest on broad, low relief interfluves. The resulting susceptibility maps exhibit clearer slope breaks, reduced speckle, and more contiguous high susceptibility belts, providing interpretable, object level outputs to support planning and hazard mitigation in rugged mountainous terrain.
{"title":"Edge-Aware Superpixel Dual-Graph GCN for Topographically Heterogeneous Landslide Susceptibility Assessment","authors":"Changle Li;Yuewei Wang;Feifei Zhang;Hongwei Zhang","doi":"10.1109/JSTARS.2026.3656940","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656940","url":null,"abstract":"Pixel-based landslide susceptibility assessment (LSA) is prone to boundary blurring, salt and pepper artifacts, and unstable generalization in topographically heterogeneous mountains. To address these issues, we propose an edge-aware superpixel graph framework with a dual-graph graph convolutional network. Terrain enhanced simple linear iterative clustering with watershed initialization is employed to generate edge-aware superpixels that better follow geomorphic and engineered boundaries, thereby reducing label noise and boundary leakage. A fused graph is then constructed by combining a spatial adjacency graph with a feature similarity graph based on <inline-formula><tex-math>$k$</tex-math></inline-formula>-nearest neighbors, enabling information propagation among both neighboring and environmentally similar superpixels. Learning under severe class imbalance is stabilized through focal loss with class weighting, spatially blocked data splits, probability calibration, and threshold scanning. Experiments conducted over a remote sensing strip study area, defined by the footprint of a Landsat 9 scene in Shanxi, China, and integrating multisource conditioning factors, show that the proposed method exceeds the mean performance of the compared methods by 11.6% in area under curve (AUC) and 18.1% in <inline-formula><tex-math>$F1$</tex-math></inline-formula> under identical data and spatial splits. Gains are most pronounced along narrow valley flanks, ridge gully transition zones, and engineered cut slopes, whereas changes are modest on broad, low relief interfluves. The resulting susceptibility maps exhibit clearer slope breaks, reduced speckle, and more contiguous high susceptibility belts, providing interpretable, object level outputs to support planning and hazard mitigation in rugged mountainous terrain.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5634-5648"},"PeriodicalIF":5.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11361010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1109/JSTARS.2026.3656403
Luo Zuo;Jiayi Sun;Jie Li;Feixiang Liu;Jinglei Li;Guanchong Niu
Change detection (CD) in remote sensing is essential for tracking transformations in land use, infrastructure, and landslide mapping over time. Although traditional deep learning (DL) models have shown effectiveness in standard scenarios, they often struggle with robustness and generalization, resulting in unreliable predictions in novel or complex environments. While large language models (LLMs) have been proposed to address generalization ability in various tasks, their performance is limited due to the hallucination issue. To overcome these limitations, we propose a change vector analysis (CVA)-based hallucination-resistant change detection multimodal large language model framework named CVAHR-CD-LLM, which is designed to suppress hallucinations while maintaining strong generalization capabilities. Specifically, we expand existing CD datasets, including those for landslide mapping, using a semisupervised approach and fine-tune CVAHR-CD-LLM to enhance its generalization ability. In addition, to address the hallucination issues commonly associated with LLMs, we introduce a CVA-based coordinations iterative calibration loop. Furthermore, we incorporate a self-consistency mechanism, which aggregates multiple reasoning paths to ensure robust predictions and reduce hallucinations during the CD process. This mechanism applies iterative corrections and multistep inference to refine detected change coordinates, leading to enhanced accuracy and reliability. Our method demonstrates substantial improvements in prediction precision, advancing the potential for autonomous and accurate land management applications.
{"title":"Hallucination-Resistant Change Detection in Multimodal Large Models for Autonomous Land Management Agents","authors":"Luo Zuo;Jiayi Sun;Jie Li;Feixiang Liu;Jinglei Li;Guanchong Niu","doi":"10.1109/JSTARS.2026.3656403","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656403","url":null,"abstract":"Change detection (CD) in remote sensing is essential for tracking transformations in land use, infrastructure, and landslide mapping over time. Although traditional deep learning (DL) models have shown effectiveness in standard scenarios, they often struggle with robustness and generalization, resulting in unreliable predictions in novel or complex environments. While large language models (LLMs) have been proposed to address generalization ability in various tasks, their performance is limited due to the hallucination issue. To overcome these limitations, we propose a change vector analysis (CVA)-based hallucination-resistant change detection multimodal large language model framework named CVAHR-CD-LLM, which is designed to suppress hallucinations while maintaining strong generalization capabilities. Specifically, we expand existing CD datasets, including those for landslide mapping, using a semisupervised approach and fine-tune CVAHR-CD-LLM to enhance its generalization ability. In addition, to address the hallucination issues commonly associated with LLMs, we introduce a CVA-based coordinations iterative calibration loop. Furthermore, we incorporate a self-consistency mechanism, which aggregates multiple reasoning paths to ensure robust predictions and reduce hallucinations during the CD process. This mechanism applies iterative corrections and multistep inference to refine detected change coordinates, leading to enhanced accuracy and reliability. Our method demonstrates substantial improvements in prediction precision, advancing the potential for autonomous and accurate land management applications.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6313-6327"},"PeriodicalIF":5.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1109/JSTARS.2026.3656700
Krishnan Batri;Lakshmi S;Mahesh T R;Surbhi Bhatia Khan;Asma Alshuhail;Ahlam Almusharraf
Hyperspectral anomaly detection faces fundamental challenges in balancing spatial context, statistical rigor, and interpretability without ground truth supervision. This article presents spatially informed theoretical model (STIM), a novel unsupervised framework that addresses these challenges through a principled two-stage reference computation architecture. STIM systematically aggregates local spectral statistics into globally informed spatial references, enabling the derivation of three complementary features: energy (photometric deviation), entropy (local spectral coherence), and divergence (global statistical rarity). We establish theoretical foundations including noise robustness, Lipschitz continuity, and information-theoretic optimality with convergence guarantees. Comprehensive validation on five Airborne Visible/Infrared Imaging Spectrometer—Next Generation benchmark datasets demonstrates STIM's substantial superiority over traditional statistical and deep learning methods, achieving 14.6× to 585× improvements in mean anomaly scores with a reliability index of 0.933. Feature dynamics analysis confirms multimodal orthogonality and consistent interpretability across diverse hyperspectral environments. STIM enables robust, interpretable, and generalizable anomaly detection for operational hyperspectral imaging without requiring labeled supervision or scene-specific calibration, advancing the state-of-the-art in unsupervised hyperspectral analysis.
{"title":"STIM: A Unified Spatially Informed Model for Robust Hyperspectral Anomaly Detection","authors":"Krishnan Batri;Lakshmi S;Mahesh T R;Surbhi Bhatia Khan;Asma Alshuhail;Ahlam Almusharraf","doi":"10.1109/JSTARS.2026.3656700","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656700","url":null,"abstract":"Hyperspectral anomaly detection faces fundamental challenges in balancing spatial context, statistical rigor, and interpretability without ground truth supervision. This article presents spatially informed theoretical model (STIM), a novel unsupervised framework that addresses these challenges through a principled two-stage reference computation architecture. STIM systematically aggregates local spectral statistics into globally informed spatial references, enabling the derivation of three complementary features: energy (photometric deviation), entropy (local spectral coherence), and divergence (global statistical rarity). We establish theoretical foundations including noise robustness, Lipschitz continuity, and information-theoretic optimality with convergence guarantees. Comprehensive validation on five Airborne Visible/Infrared Imaging Spectrometer—Next Generation benchmark datasets demonstrates STIM's substantial superiority over traditional statistical and deep learning methods, achieving 14.6× to 585× improvements in mean anomaly scores with a reliability index of 0.933. Feature dynamics analysis confirms multimodal orthogonality and consistent interpretability across diverse hyperspectral environments. STIM enables robust, interpretable, and generalizable anomaly detection for operational hyperspectral imaging without requiring labeled supervision or scene-specific calibration, advancing the state-of-the-art in unsupervised hyperspectral analysis.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"7204-7234"},"PeriodicalIF":5.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1109/JSTARS.2026.3655583
Yujia Fu;Mingyang Wang;Danfeng Hong;Gemine Vivone
Self-supervisedlearning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.
{"title":"Dual-Domain Masked Representation Learning for Semantic Segmentation of Remote Sensing Images","authors":"Yujia Fu;Mingyang Wang;Danfeng Hong;Gemine Vivone","doi":"10.1109/JSTARS.2026.3655583","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655583","url":null,"abstract":"Self-supervisedlearning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5507-5519"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1109/JSTARS.2026.3656191
Yang Yang;Wei Ao;Shunyi Zheng;Zhao Liu;Yunni Wu
Accurate semantic segmentation of high-resolution remote sensing imagery is essential for applications ranging from urban planning to environmental monitoring. However, this task remains fundamentally challenging due to the complex spatial patterns, extreme scale variations, and fine-grained details inherent in geographical scenes. While attention mechanisms, particularly global and sparse attention, have shown promise in capturing long-range dependencies, existing approaches often suffer from three interconnected limitations: prohibitive computational complexity, misalignment when integrating multiscale representations, and loss of semantic information during the decoder’s upsampling stages. This article introduces AMPUNet, a novel framework designed to overcome these limitations through the construction of a hierarchical, coarse-to-fine attention map pyramid. Our core innovation lies in explicitly propagating and refining attention maps across network layers rather than operating solely on feature maps. Specifically, we design: first, a hybrid sparse attention framework combining a block attention module and a column attention module to model global context efficiently, second, a dimension correspondence module to achieve tensor-level granularity alignment for multiscale attention maps, and third, an attention map merging module with a CAW strategy, which directly transfers high-level semantic information from deep to shallow layers, mitigating information degradation. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA benchmarks demonstrate that AMPUNet achieves superior performance, with mean intersection of union scores of 75.43% on Vaihingen, 78.03% on Potsdam, and 50.94% on LoveDA, while maintaining competitive inference efficiency. Our findings confirm that structuring attention into a learnable pyramid is a highly effective paradigm for remote sensing semantic segmentation, successfully balancing precise detail preservation with robust global understanding.
{"title":"AMPUNet: Hierarchical Attention Map Pyramid for Semantic Segmentation of Remote Sensing Images","authors":"Yang Yang;Wei Ao;Shunyi Zheng;Zhao Liu;Yunni Wu","doi":"10.1109/JSTARS.2026.3656191","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3656191","url":null,"abstract":"Accurate semantic segmentation of high-resolution remote sensing imagery is essential for applications ranging from urban planning to environmental monitoring. However, this task remains fundamentally challenging due to the complex spatial patterns, extreme scale variations, and fine-grained details inherent in geographical scenes. While attention mechanisms, particularly global and sparse attention, have shown promise in capturing long-range dependencies, existing approaches often suffer from three interconnected limitations: prohibitive computational complexity, misalignment when integrating multiscale representations, and loss of semantic information during the decoder’s upsampling stages. This article introduces AMPUNet, a novel framework designed to overcome these limitations through the construction of a hierarchical, coarse-to-fine attention map pyramid. Our core innovation lies in explicitly propagating and refining attention maps across network layers rather than operating solely on feature maps. Specifically, we design: first, a hybrid sparse attention framework combining a block attention module and a column attention module to model global context efficiently, second, a dimension correspondence module to achieve tensor-level granularity alignment for multiscale attention maps, and third, an attention map merging module with a CAW strategy, which directly transfers high-level semantic information from deep to shallow layers, mitigating information degradation. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA benchmarks demonstrate that AMPUNet achieves superior performance, with mean intersection of union scores of 75.43% on Vaihingen, 78.03% on Potsdam, and 50.94% on LoveDA, while maintaining competitive inference efficiency. Our findings confirm that structuring attention into a learnable pyramid is a highly effective paradigm for remote sensing semantic segmentation, successfully balancing precise detail preservation with robust global understanding.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6328-6340"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359465","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1109/JSTARS.2026.3655691
Yuan Yuan;Junhan Zhou;Lei Lin;Ying Yu;Qingshan Liu
Optical satellite time series data play a crucial role in monitoring vegetation dynamics and land surface changes. However, persistent cloud cover often leads to missing data, particularly during critical phenological stages, which significantly diminishes data quality and hinders downstream applications. To address this issue, we present conditional optical-SAR multitemporal diffusion (CosmDiff), a novel framework for reconstructing optical satellite time series by integrating multimodal, multitemporal optical and synthetic aperture radar (SAR) data using conditional diffusion models. In CosmDiff, the reconstruction task is formulated as a multivariate time series imputation problem, where missing values are modeled as conditionally dependent on both cloudfree optical observations and synergic SAR time series. The framework incorporates a Transformer-based network within the diffusion process, introducing a novel dimensional decomposition attention mechanism that fuses optical-SAR time series across both temporal and feature dimensions. This mechanism enables the dynamic extraction of essential and complementary features from both modalities. In addition, linearly interpolated optical time series are used as auxiliary inputs to further guide the imputation process. Experimental results on Sentinel-1/-2 datasets demonstrate that CosmDiff consistently outperforms both traditional interpolation methods and advanced deep learning approaches, achieving a 3.8% reduction in mean absolute error and a 6.8% improvement in spectral angle mapper compared to competing methods. Furthermore, CosmDiff provides comprehensive uncertainty estimates for its predictions, which are particularly valuable for decision-making applications.
{"title":"CosmDiff: Integrating Multitemporal Optical-SAR Data With Conditional Diffusion Models for Optical Satellite Time Series Reconstruction","authors":"Yuan Yuan;Junhan Zhou;Lei Lin;Ying Yu;Qingshan Liu","doi":"10.1109/JSTARS.2026.3655691","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3655691","url":null,"abstract":"Optical satellite time series data play a crucial role in monitoring vegetation dynamics and land surface changes. However, persistent cloud cover often leads to missing data, particularly during critical phenological stages, which significantly diminishes data quality and hinders downstream applications. To address this issue, we present conditional optical-SAR multitemporal diffusion (CosmDiff), a novel framework for reconstructing optical satellite time series by integrating multimodal, multitemporal optical and synthetic aperture radar (SAR) data using conditional diffusion models. In CosmDiff, the reconstruction task is formulated as a multivariate time series imputation problem, where missing values are modeled as conditionally dependent on both cloudfree optical observations and synergic SAR time series. The framework incorporates a Transformer-based network within the diffusion process, introducing a novel dimensional decomposition attention mechanism that fuses optical-SAR time series across both temporal and feature dimensions. This mechanism enables the dynamic extraction of essential and complementary features from both modalities. In addition, linearly interpolated optical time series are used as auxiliary inputs to further guide the imputation process. Experimental results on Sentinel-1/-2 datasets demonstrate that CosmDiff consistently outperforms both traditional interpolation methods and advanced deep learning approaches, achieving a 3.8% reduction in mean absolute error and a 6.8% improvement in spectral angle mapper compared to competing methods. Furthermore, CosmDiff provides comprehensive uncertainty estimates for its predictions, which are particularly valuable for decision-making applications.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"5722-5740"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11359003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}