Pub Date : 2025-12-17DOI: 10.1109/JSTARS.2025.3645623
Zheng Li;Hao Feng;Dongdong Xu;Tianqi Zhao;Boxiao Wang;Yongcheng Wang
Small weak object detection (SWOD) is a significant but neglected task in remote sensing image interpretation. Due to limitations in imaging resolution and inherent characteristics of the objects, detection networks struggle to effectively extract semantic features, which are crucial for object identification and recognition. In recent years, graph convolutional networks (GCNs) have been developed to handle non-Euclidean data. Through GCNs, node data are enriched via aggregation and propagation across the graph. In this article, we explore the feasibility of GCNs in semantic clue extraction to address the lack of key semantics in small weak objects. First, we propose a multihead graph reasoning learning model (MGRL) that projects initial feature representations into graph space and utilizes a two-layer multihead graph network to extract essential semantic information. Second, we introduce a foreground-background binary masking technique that roughly segments the foreground region of the image. The mask is converted into a prior prompt, which is then incorporated into the adjacency matrix, emphasizing object reasoning in MGRL. Next, we present a cross learning-based feature alignment learning module to resolve feature misalignment issues caused by spatial projection. Finally, we adopt a cross-layer semantic interaction module to facilitate cross-layer communication and aggregation of features. Extensive experiments are conducted on five remote sensing datasets: DIOR, AI-TOD, NWPU VHR-10, DOTA-v1.0, and STAR. The experimental results demonstrate the superior performance and advantages of our method.
{"title":"SRNet: A Semantic Reasoning Network for Small Weak Object Detection in Remote Sensing Images","authors":"Zheng Li;Hao Feng;Dongdong Xu;Tianqi Zhao;Boxiao Wang;Yongcheng Wang","doi":"10.1109/JSTARS.2025.3645623","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645623","url":null,"abstract":"Small weak object detection (SWOD) is a significant but neglected task in remote sensing image interpretation. Due to limitations in imaging resolution and inherent characteristics of the objects, detection networks struggle to effectively extract semantic features, which are crucial for object identification and recognition. In recent years, graph convolutional networks (GCNs) have been developed to handle non-Euclidean data. Through GCNs, node data are enriched via aggregation and propagation across the graph. In this article, we explore the feasibility of GCNs in semantic clue extraction to address the lack of key semantics in small weak objects. First, we propose a multihead graph reasoning learning model (MGRL) that projects initial feature representations into graph space and utilizes a two-layer multihead graph network to extract essential semantic information. Second, we introduce a foreground-background binary masking technique that roughly segments the foreground region of the image. The mask is converted into a prior prompt, which is then incorporated into the adjacency matrix, emphasizing object reasoning in MGRL. Next, we present a cross learning-based feature alignment learning module to resolve feature misalignment issues caused by spatial projection. Finally, we adopt a cross-layer semantic interaction module to facilitate cross-layer communication and aggregation of features. Extensive experiments are conducted on five remote sensing datasets: DIOR, AI-TOD, NWPU VHR-10, DOTA-v1.0, and STAR. The experimental results demonstrate the superior performance and advantages of our method.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2680-2695"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11302765","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1109/JSTARS.2025.3645208
Xin He;Yaqin Zhao;Yushi Chen;Limin Zou
Multimodal remote sensing data always comprise hyperspectral image, light detection and ranging data, and synthetic-aperture radar. Different modalities supply complementary information that improves accuracy for multimodal remote sensing classification. Although deep learning-based methods have been a mainstream, they also introduce high computational and energy costs. Unlike existing models, spiking neural network (SNN) is intrinsically energy-efficient, which reduces computation cost and energy expenditure with only a small subset of neurons. However, SNN extracts spatial and channel features without considering the redundancy among different remote sensing modalities. To reduce cross-modal redundancy while retaining important features, this paper proposes an efficient spatial-channel SNN for multimodal remote sensing data classification. First, in the modality fusion step, considering the quality of different modalities is various (i.e., hyperspectral image is impaired by cloud), an energy-guided multimodal remote sensing fusion strategy is proposed, which allocates a high weight for the informative single modality, suppressing less-informative ones by optimizing the generation bound. Second, we leverage information from the high-quality modality. In the spatial feature learning step, an efficient spatial SNN is proposed. It transforms the spatial features to the frequency domain, which shares parameters across different time steps in the spatial dimension to reduce spatial feature redundancy. Finally, to further reduce the redundancy, an efficient channel SNN is explored, which focuses on learning important spike representations in the channel dimension by learning learnable parameters. Experimental results on the three multimodal remote sensing datasets indicate that the proposed methods are competitive compared to the state-of-the-art models.
{"title":"Efficient Spatial-Channel Spiking Neural Network for Multimodal Remote Sensing Data Classification","authors":"Xin He;Yaqin Zhao;Yushi Chen;Limin Zou","doi":"10.1109/JSTARS.2025.3645208","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645208","url":null,"abstract":"Multimodal remote sensing data always comprise hyperspectral image, light detection and ranging data, and synthetic-aperture radar. Different modalities supply complementary information that improves accuracy for multimodal remote sensing classification. Although deep learning-based methods have been a mainstream, they also introduce high computational and energy costs. Unlike existing models, spiking neural network (SNN) is intrinsically energy-efficient, which reduces computation cost and energy expenditure with only a small subset of neurons. However, SNN extracts spatial and channel features without considering the redundancy among different remote sensing modalities. To reduce cross-modal redundancy while retaining important features, this paper proposes an efficient spatial-channel SNN for multimodal remote sensing data classification. First, in the modality fusion step, considering the quality of different modalities is various (i.e., hyperspectral image is impaired by cloud), an energy-guided multimodal remote sensing fusion strategy is proposed, which allocates a high weight for the informative single modality, suppressing less-informative ones by optimizing the generation bound. Second, we leverage information from the high-quality modality. In the spatial feature learning step, an efficient spatial SNN is proposed. It transforms the spatial features to the frequency domain, which shares parameters across different time steps in the spatial dimension to reduce spatial feature redundancy. Finally, to further reduce the redundancy, an efficient channel SNN is explored, which focuses on learning important spike representations in the channel dimension by learning learnable parameters. Experimental results on the three multimodal remote sensing datasets indicate that the proposed methods are competitive compared to the state-of-the-art models.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2879-2890"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11303103","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1109/JSTARS.2025.3645399
Pan Duan;Tianjie Zhao;Haishen Lü;Shuyan Lang;Jingyao Zheng;Yu Bai;Zhiqing Peng;Wolfgang Wagner;Peng Guo;Hongtao Shi;Congrong Sun;Li Jia;Di Zhu;Xiaolong Dong;Jiancheng Shi
The monitoring of global soil moisture is crucial for understanding the hydrological cycle and managing terrestrial water resources. The China–France Oceanography Satellite (CFOSAT), equipped with the first sector-beam rotary scanning microwave scatterometer (CSCAT), provides a novel opportunity for global soil moisture mapping. However, the capability of CFOSAT’s Ku-band for soil moisture retrieval remains underexplored and lacks systematic evaluation. In this study, an Adaptive Backscatter Change Tracking (ABCT) algorithm is designed to retrieve absolute soil moisture from CFOSAT’s CSCAT measurements. The ABCT algorithm assumes a stable roughness, where changes in backscattering are primarily attributed to soil moisture variation based on a logarithmic relationship. It incorporates a vegetation influence coefficient, which quantifies how vegetation impacts the backscatter signal. This coefficient adaptively scales with changes in the Normalized Difference Vegetation Index to adjust the backscattering appropriately to include the effect of vegetation growth or decay. This allows the algorithm to isolate changes in the backscatter signal that are due to soil moisture while minimizing the false readings from vegetation growth or wilting. The CFOSAT ABCT algorithm’s performance was evaluated against extensive in-situ soil moisture data, demonstrating a robust correlation, with the Vertical-Vertical Polarization Ascending Orbit (VV Asc) result showing the highest accuracy, indicated by Pearson’s correlation coefficient (R) of 0.68 and unbiased root mean squared error (ubRMSE) of 0.057 m3/m3. Comparative analysis with the Advanced Scatterometer (ASCAT) data revealed that, while the ABCT algorithm’s correlation was slightly lower than that of the official EUMETSAT H SAF product, it notably improved the bias and ubRMSE metrics. This study underscores that the CFOSAT ABCT soil moisture retrieval algorithm and product are a valuable addition to global soil moisture mapping, complementing existing satellite missions or sensors such as SMAP, SMOS, ASCAT, AMSR2, and FY-3/MWRI.
全球土壤湿度监测对于了解水文循环和管理陆地水资源至关重要。中法海洋卫星(CFOSAT)配备了首个扇形波束旋转扫描微波散射仪(CSCAT),为全球土壤湿度测绘提供了一个新的机会。然而,CFOSAT的ku波段土壤水分检索能力仍未得到充分开发,也缺乏系统的评价。在本研究中,设计了一种自适应后向散射变化跟踪(ABCT)算法,用于从CFOSAT的CSCAT测量数据中检索绝对土壤湿度。ABCT算法假设粗糙度稳定,其中后向散射的变化主要归因于基于对数关系的土壤湿度变化。它包含植被影响系数,该系数量化了植被对后向散射信号的影响。该系数自适应缩放归一化植被指数的变化,以适当调整后向散射,以包括植被生长或腐烂的影响。这使得算法可以隔离由于土壤湿度引起的后向散射信号的变化,同时最大限度地减少植被生长或枯萎的错误读数。CFOSAT ABCT算法的性能与大量的原位土壤湿度数据进行了评估,显示出很强的相关性,其中垂直垂直极化上升轨道(VV Asc)结果显示出最高的准确性,Pearson相关系数(R)为0.68,无偏均方根误差(ubRMSE)为0.057 m3/m3。与Advanced Scatterometer (ASCAT)数据的对比分析表明,ABCT算法的相关性略低于EUMETSAT H SAF官方产品,但显著改善了偏差和ubRMSE指标。该研究强调,CFOSAT ABCT土壤水分检索算法和产品是对全球土壤水分制图的重要补充,是对现有卫星任务或传感器(如SMAP、SMOS、ASCAT、AMSR2和FY-3/MWRI)的补充。
{"title":"A New Addition to Global Soil Moisture Mapping: CFOSAT Scatterometer Algorithm Development and Validation","authors":"Pan Duan;Tianjie Zhao;Haishen Lü;Shuyan Lang;Jingyao Zheng;Yu Bai;Zhiqing Peng;Wolfgang Wagner;Peng Guo;Hongtao Shi;Congrong Sun;Li Jia;Di Zhu;Xiaolong Dong;Jiancheng Shi","doi":"10.1109/JSTARS.2025.3645399","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645399","url":null,"abstract":"The monitoring of global soil moisture is crucial for understanding the hydrological cycle and managing terrestrial water resources. The China–France Oceanography Satellite (CFOSAT), equipped with the first sector-beam rotary scanning microwave scatterometer (CSCAT), provides a novel opportunity for global soil moisture mapping. However, the capability of CFOSAT’s Ku-band for soil moisture retrieval remains underexplored and lacks systematic evaluation. In this study, an Adaptive Backscatter Change Tracking (ABCT) algorithm is designed to retrieve absolute soil moisture from CFOSAT’s CSCAT measurements. The ABCT algorithm assumes a stable roughness, where changes in backscattering are primarily attributed to soil moisture variation based on a logarithmic relationship. It incorporates a vegetation influence coefficient, which quantifies how vegetation impacts the backscatter signal. This coefficient adaptively scales with changes in the Normalized Difference Vegetation Index to adjust the backscattering appropriately to include the effect of vegetation growth or decay. This allows the algorithm to isolate changes in the backscatter signal that are due to soil moisture while minimizing the false readings from vegetation growth or wilting. The CFOSAT ABCT algorithm’s performance was evaluated against extensive in-situ soil moisture data, demonstrating a robust correlation, with the Vertical-Vertical Polarization Ascending Orbit (VV Asc) result showing the highest accuracy, indicated by Pearson’s correlation coefficient (R) of 0.68 and unbiased root mean squared error (ubRMSE) of 0.057 m<sup>3</sup>/m<sup>3</sup>. Comparative analysis with the Advanced Scatterometer (ASCAT) data revealed that, while the ABCT algorithm’s correlation was slightly lower than that of the official EUMETSAT H SAF product, it notably improved the bias and ubRMSE metrics. This study underscores that the CFOSAT ABCT soil moisture retrieval algorithm and product are a valuable addition to global soil moisture mapping, complementing existing satellite missions or sensors such as SMAP, SMOS, ASCAT, AMSR2, and FY-3/MWRI.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2439-2460"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11303030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Forest tree species classification using remote sensing data often faces the challenge of limited labeled samples, which hampers the performance of deep learning models. While few-shot learning techniques, such as prototypical networks (PNet), show promise, overfitting remains a significant issue. Given the relatively low cost of acquiring unlabeled data, semisupervised learning offers a potential solution. However, due to class imbalance, pseudolabels based on fixed confidence thresholds tend to favor majority classes, leading to lower classification accuracy for minority classes. To address this, we propose a novel semisupervised few-shot classification model, classwise pseudolabeling squeeze-and-excitation PNet (CWPL-SEPNet). The model incorporates a channel attention module into the PNet backbone and employs a classwise adaptive pseudolabeling mechanism based on quantile thresholds. This approach balances the pseudolabeled samples and reduces bias toward majority classes. Experiments conducted using Sentinel-2 imagery in Pu’er City, China, show that incorporating unlabeled data increases the overall classification accuracy to 95.14%, with per-class accuracies of 91.82% for Tea Farm, 90.19% for Oak, 94.70% for Eucalyptus, and 94.64% for Pinus kesiya. The CWPL strategy significantly outperforms traditional fixed-threshold methods, particularly in handling class imbalance and improving classification accuracy for minority classes. Compared to baseline methods such as TPN-semi, PNet, random forest, and support vector machine, CWPL-SEPNet excels in overall accuracy, average accuracy, and Kappa value. Furthermore, the model was validated on three publicly available remote sensing datasets. CWPL-SEPNet provides a robust and efficient classification solution under few-shot conditions, offering an effective approach for tree species classification using remote sensing data.
{"title":"A Semisupervised Prototypical Network With Dynamic Threshold Pseudolabeling for Forest Classification","authors":"Yifan Xie;Long Chen;Jiahao Wang;Nuermaimaitijiang Aierken;Geng Wang;Xiaoli Zhang","doi":"10.1109/JSTARS.2025.3645613","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645613","url":null,"abstract":"Forest tree species classification using remote sensing data often faces the challenge of limited labeled samples, which hampers the performance of deep learning models. While few-shot learning techniques, such as prototypical networks (PNet), show promise, overfitting remains a significant issue. Given the relatively low cost of acquiring unlabeled data, semisupervised learning offers a potential solution. However, due to class imbalance, pseudolabels based on fixed confidence thresholds tend to favor majority classes, leading to lower classification accuracy for minority classes. To address this, we propose a novel semisupervised few-shot classification model, classwise pseudolabeling squeeze-and-excitation PNet (CWPL-SEPNet). The model incorporates a channel attention module into the PNet backbone and employs a classwise adaptive pseudolabeling mechanism based on quantile thresholds. This approach balances the pseudolabeled samples and reduces bias toward majority classes. Experiments conducted using Sentinel-2 imagery in Pu’er City, China, show that incorporating unlabeled data increases the overall classification accuracy to 95.14%, with per-class accuracies of 91.82% for Tea Farm, 90.19% for Oak, 94.70% for <italic>Eucalyptus</i>, and 94.64% for <italic>Pinus kesiya</i>. The CWPL strategy significantly outperforms traditional fixed-threshold methods, particularly in handling class imbalance and improving classification accuracy for minority classes. Compared to baseline methods such as TPN-semi, PNet, random forest, and support vector machine, CWPL-SEPNet excels in overall accuracy, average accuracy, and Kappa value. Furthermore, the model was validated on three publicly available remote sensing datasets. CWPL-SEPNet provides a robust and efficient classification solution under few-shot conditions, offering an effective approach for tree species classification using remote sensing data.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2405-2422"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11303117","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobacco is a phenology-sensitive and economically significant crop that requires accurate and timely spatial mapping to support agricultural planning and public health regulation. However, single-date spectral similarity among crops and regional differences in planting practices limit the generalizability of existing approaches, particularly deep learning (DL) models. To address these challenges, we propose a novel phenologyguided DL framework that leverages satellite image time series (SITS) to capture crop-specific growth dynamics. Specifically, we introduce the tobacco spectral-phenological variable (TSP), which captures change rates in Red Edge-2 during peak growth. It serves as crop-specific prior knowledge for model guidance. Based on this, we develop TSP-Former, a transformer architecture that incorporates two novel modules: a central prior attention module (CPAM), which adaptively fuses spectral information with phenological priors, and an NDVI-enhanced temporal decoder (NDTD), which reinforces temporal learning by emphasizing phenologically critical stages using NDVI-weighted sequences. Extensive experiments across four major tobacco regions using Sentinel-2 imagery demonstrate the method’s superior cross-regional robustness. TSP-Former achieves an average weighted F1-score of 87.1% and an overall accuracy of 85.9%, significantly outperforming random forest and competing DL approaches. Notably, in challenging regions characterized by substantial phenological shifts, the proposed method surpasses the emerging remote sensing foundation model, AlphaEarth with a fine-tuned lightweight multilayer perceptron, by over 15% in accuracy. These findings highlight the effectiveness of integrating phenological priors into temporal deep models, enabling robust and transferable crop mapping across heterogeneous and data-constrained regions, with clear implications for scalable agricultural monitoring and policy development.
{"title":"TSP-Former: A Phenology-Guided Transformer for Tobacco Mapping Using Satellite Image Time Series","authors":"Huaming Gao;Yongqing Bai;Qing Sun;Haoran Wang;Xiangyu Tian;Hui Ma;Yixiang Li;Xianghong Che;Zhengchao Chen","doi":"10.1109/JSTARS.2025.3645265","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645265","url":null,"abstract":"Tobacco is a phenology-sensitive and economically significant crop that requires accurate and timely spatial mapping to support agricultural planning and public health regulation. However, single-date spectral similarity among crops and regional differences in planting practices limit the generalizability of existing approaches, particularly deep learning (DL) models. To address these challenges, we propose a novel phenologyguided DL framework that leverages satellite image time series (SITS) to capture crop-specific growth dynamics. Specifically, we introduce the tobacco spectral-phenological variable (TSP), which captures change rates in Red Edge-2 during peak growth. It serves as crop-specific prior knowledge for model guidance. Based on this, we develop TSP-Former, a transformer architecture that incorporates two novel modules: a central prior attention module (CPAM), which adaptively fuses spectral information with phenological priors, and an NDVI-enhanced temporal decoder (NDTD), which reinforces temporal learning by emphasizing phenologically critical stages using NDVI-weighted sequences. Extensive experiments across four major tobacco regions using Sentinel-2 imagery demonstrate the method’s superior cross-regional robustness. TSP-Former achieves an average weighted F1-score of 87.1% and an overall accuracy of 85.9%, significantly outperforming random forest and competing DL approaches. Notably, in challenging regions characterized by substantial phenological shifts, the proposed method surpasses the emerging remote sensing foundation model, AlphaEarth with a fine-tuned lightweight multilayer perceptron, by over 15% in accuracy. These findings highlight the effectiveness of integrating phenological priors into temporal deep models, enabling robust and transferable crop mapping across heterogeneous and data-constrained regions, with clear implications for scalable agricultural monitoring and policy development.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2423-2438"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11302804","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1109/JSTARS.2025.3645116
Xi Kan;Xu Liu;Yonghong Zhang;Linglong Zhu;Jing Wang;Zhou Zhou;Lei Gong;Xianwu Wang
The maximum spatial resolution varies significantly across different bands in the reflectance data of FengYun-4A (FY-4A) remote sensing images. Super-resolution (SR) reconstruction of FY-4A remote sensing imagery not only enhances the spatial accuracy of low-resolution bands and achieves cross-band scale consistency, but also improves feature recognition and monitoring capabilities. This provides clearer and more reliable data support for quantitative remote sensing and applications in meteorology, ecology, agriculture, and other fields. Therefore, a differential enhancement super-resolution network (DESR) is proposed based on scale invariance. The dual paths consist of a CNN branch and a Swin Transformer branch. The CNN branch employs rep-residual blocks (RRB) to capture spatial structures, where each RRB integrates a spatial feature attention that employs large convolutional kernels with directional strides along the height and width to model long-range dependencies and spatial correlations. The Swin Transformer branch adopts residual Swin Transformer blocks to obtain a global receptive field. In addition, A differential feature enhancement module is further introduced to fuse features, highlight branch-specific deficiencies through subtraction, and achieve complementary enhancement. Experimental results show that DESR achieves more uniform error distribution and superior reconstruction quality compared with representative methods. On 2 × and 4 × SR tasks, DESR reaches PSNR values of 54.1395 and 46.0942, SSIM values of 0.9899 and 0.9749, with improvements at least 1.87% and 0.59%, respectively, while also attaining the best spectral angle mapping.
{"title":"DESR: Super-Resolution Reconstruction of FengYun-4 Multispectral Images","authors":"Xi Kan;Xu Liu;Yonghong Zhang;Linglong Zhu;Jing Wang;Zhou Zhou;Lei Gong;Xianwu Wang","doi":"10.1109/JSTARS.2025.3645116","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3645116","url":null,"abstract":"The maximum spatial resolution varies significantly across different bands in the reflectance data of FengYun-4A (FY-4A) remote sensing images. Super-resolution (SR) reconstruction of FY-4A remote sensing imagery not only enhances the spatial accuracy of low-resolution bands and achieves cross-band scale consistency, but also improves feature recognition and monitoring capabilities. This provides clearer and more reliable data support for quantitative remote sensing and applications in meteorology, ecology, agriculture, and other fields. Therefore, a differential enhancement super-resolution network (DESR) is proposed based on scale invariance. The dual paths consist of a CNN branch and a Swin Transformer branch. The CNN branch employs rep-residual blocks (RRB) to capture spatial structures, where each RRB integrates a spatial feature attention that employs large convolutional kernels with directional strides along the height and width to model long-range dependencies and spatial correlations. The Swin Transformer branch adopts residual Swin Transformer blocks to obtain a global receptive field. In addition, A differential feature enhancement module is further introduced to fuse features, highlight branch-specific deficiencies through subtraction, and achieve complementary enhancement. Experimental results show that DESR achieves more uniform error distribution and superior reconstruction quality compared with representative methods. On 2 × and 4 × SR tasks, DESR reaches PSNR values of 54.1395 and 46.0942, SSIM values of 0.9899 and 0.9749, with improvements at least 1.87% and 0.59%, respectively, while also attaining the best spectral angle mapping.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2605-2620"},"PeriodicalIF":5.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11303137","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate detection of forest stress from satellite data depends heavily on selecting informative spectral features. Traditional approaches rely on a limited set of predefined vegetation indices, which may not generalize across environmental conditions. In this study, we introduce enhanced maximum informativeness maximum independence (E-MIMI), an efficient and interpretable feature selection strategy that identifies optimal combinations of spectral features from generalized vegetation index classes rather than fixed indices. The method combines a genetic algorithm with a caching mechanism and informativeness-based scoring to reduce computation time while maintaining high accuracy. Applied to Sentinel-2 imagery from two ecologically distinct regions, E-MIMI consistently selected index combinations involving red-edge and shortwave infrared bands—spectral domains known to reflect canopy water content and chlorophyll degradation. E-MIMI demonstrates exceptional computational efficiency, completing feature selection up to 80 times faster and using over 1000 times less memory than other traditional methods on large feature spaces. Despite this, E-MIMI achieves comparable levels of segmentation performance with a test intersection over union (IoU) of 0.61–0.63, while other methods reach an IoU of 0.60–0.64. Obtained models show a substantial improvement over previous studies in the same region (0.515–0.549 IoU). The model also generalized well to an independent dataset from Chornobyl, confirming its robustness. By integrating computer vision techniques with biophysically grounded features, our approach supports scalable, ecologically meaningful forest stress monitoring and offers a practical foundation for broader environmental applications requiring interpretable and computationally efficient feature selection.
从卫星数据中准确检测森林应力在很大程度上取决于选择信息丰富的光谱特征。传统的方法依赖于一组有限的预定义植被指数,这些指数可能无法在各种环境条件下进行推广。在本研究中,我们引入了增强最大信息量最大独立性(enhanced maximum informativeness maximum independence, E-MIMI),这是一种高效且可解释的特征选择策略,可以从广义植被指数类别而不是固定指数中识别光谱特征的最佳组合。该方法将遗传算法与缓存机制和基于信息的评分相结合,在保持较高准确率的同时减少了计算时间。E-MIMI应用于来自两个生态不同区域的Sentinel-2图像,一致地选择了包括红边和短波红外波段的指数组合,这些光谱域已知可以反映冠层含水量和叶绿素降解。E-MIMI展示了卓越的计算效率,在大型特征空间上完成特征选择的速度比其他传统方法快80倍,使用的内存比其他传统方法少1000倍以上。尽管如此,E-MIMI达到了相当水平的分割性能,测试交集超过联合(IoU)为0.61-0.63,而其他方法的IoU为0.60-0.64。获得的模型显示,在同一地区(0.515-0.549 IoU),比以往的研究有了很大的改善。该模型也可以很好地推广到来自切尔诺贝利的独立数据集,证实了它的鲁棒性。通过将计算机视觉技术与生物物理特征相结合,我们的方法支持可扩展的、有生态意义的森林应力监测,并为需要可解释和计算效率高的特征选择的更广泛的环境应用提供了实践基础。
{"title":"Forest Stress Detection Using Feature Engineering and Selection Approach Optimized for Satellite Imagery","authors":"Yevhenii Salii;Volodymyr Kuzin;Nataliia Kussul;Alla Lavreniuk","doi":"10.1109/JSTARS.2025.3644488","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3644488","url":null,"abstract":"Accurate detection of forest stress from satellite data depends heavily on selecting informative spectral features. Traditional approaches rely on a limited set of predefined vegetation indices, which may not generalize across environmental conditions. In this study, we introduce enhanced maximum informativeness maximum independence (E-MIMI), an efficient and interpretable feature selection strategy that identifies optimal combinations of spectral features from generalized vegetation index classes rather than fixed indices. The method combines a genetic algorithm with a caching mechanism and informativeness-based scoring to reduce computation time while maintaining high accuracy. Applied to Sentinel-2 imagery from two ecologically distinct regions, E-MIMI consistently selected index combinations involving red-edge and shortwave infrared bands—spectral domains known to reflect canopy water content and chlorophyll degradation. E-MIMI demonstrates exceptional computational efficiency, completing feature selection up to 80 times faster and using over 1000 times less memory than other traditional methods on large feature spaces. Despite this, E-MIMI achieves comparable levels of segmentation performance with a test intersection over union (IoU) of 0.61–0.63, while other methods reach an IoU of 0.60–0.64. Obtained models show a substantial improvement over previous studies in the same region (0.515–0.549 IoU). The model also generalized well to an independent dataset from Chornobyl, confirming its robustness. By integrating computer vision techniques with biophysically grounded features, our approach supports scalable, ecologically meaningful forest stress monitoring and offers a practical foundation for broader environmental applications requiring interpretable and computationally efficient feature selection.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2461-2473"},"PeriodicalIF":5.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300936","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of GNSS-R ocean altimetry technology has significantly advanced global sea surface height (SSH) monitoring. However, single-satellite systems face inherent limitations: their spatial coverage is constrained by orbital patterns and revisit cycles, making them insufficient for high-precision, high-spatiotemporal-resolution global SSH monitoring. This article innovatively proposes a deep-learning model for multisource spaceborne GNSS-R fusion-based SSH retrieval. The model corrects low-accuracy single-source inversion results and integrates inversion outputs from four GNSS-R satellite systems: FY-3E, FY-3G, CYGNSS, and Tianmu-1. This approach reduces errors caused by single sources and improves the coverage area of SSH measurements. Experiments were conducted globally between 60°N and 60°S, where the higher precision FY-3E SSH data were used to correct the retrieval results of FY-3G, CYGNSS, and Tianmu-1. Under relatively lenient data quality control criteria to construct high-coverage global SSH gridded products, the test results spanning three distinct months demonstrated significant performance improvements across all constellations. Following the error correction model optimization, the FY-3G constellation achieved a corrected mean absolute error (MAE) ranging from 1.433 to 2.158 m, representing a reduction of 40%–70% compared with precorrection values. Similarly, the corrected MAE for the CYGNSS constellation ranged from 2.178 to 4.192 m, also reflecting a 40%–70% reduction. For the Tianmu-1 constellation, the corrected MAE was refined to 1.311–1.505 m, with an MAE reduction exceeding 80%. The fusion of the four satellite systems achieved a sea surface coverage of 75.75% within an 8-h window, with high consistency against validation datasets, such as DTU18 validation model and ATL12 mean SSH. The findings of this study significantly enhance the SSH retrieval accuracy of commercial constellations not originally designed for altimetry purposes (e.g., Tianmu-1) and provide a novel approach for multisource GNSS-R fusion-based SSH monitoring. This work holds important theoretical significance and practical value, particularly offering broad application prospects in global ocean monitoring and climate change research.
{"title":"Research on a Deep-Learning Model for Multisource Spaceborne GNSS-R Fusion in Sea Surface Height Retrieval","authors":"Yun Zhang;Tianyue Wen;Shuhu Yang;Qingjing Shi;Qifeng Qian;Chunyi Xiang;Jiaying Li;Binbin Li;Bo Peng;Yanling Han;Zhonghua Hong","doi":"10.1109/JSTARS.2025.3643863","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3643863","url":null,"abstract":"The development of GNSS-R ocean altimetry technology has significantly advanced global sea surface height (SSH) monitoring. However, single-satellite systems face inherent limitations: their spatial coverage is constrained by orbital patterns and revisit cycles, making them insufficient for high-precision, high-spatiotemporal-resolution global SSH monitoring. This article innovatively proposes a deep-learning model for multisource spaceborne GNSS-R fusion-based SSH retrieval. The model corrects low-accuracy single-source inversion results and integrates inversion outputs from four GNSS-R satellite systems: FY-3E, FY-3G, CYGNSS, and Tianmu-1. This approach reduces errors caused by single sources and improves the coverage area of SSH measurements. Experiments were conducted globally between 60°N and 60°S, where the higher precision FY-3E SSH data were used to correct the retrieval results of FY-3G, CYGNSS, and Tianmu-1. Under relatively lenient data quality control criteria to construct high-coverage global SSH gridded products, the test results spanning three distinct months demonstrated significant performance improvements across all constellations. Following the error correction model optimization, the FY-3G constellation achieved a corrected mean absolute error (MAE) ranging from 1.433 to 2.158 m, representing a reduction of 40%–70% compared with precorrection values. Similarly, the corrected MAE for the CYGNSS constellation ranged from 2.178 to 4.192 m, also reflecting a 40%–70% reduction. For the Tianmu-1 constellation, the corrected MAE was refined to 1.311–1.505 m, with an MAE reduction exceeding 80%. The fusion of the four satellite systems achieved a sea surface coverage of 75.75% within an 8-h window, with high consistency against validation datasets, such as DTU18 validation model and ATL12 mean SSH. The findings of this study significantly enhance the SSH retrieval accuracy of commercial constellations not originally designed for altimetry purposes (e.g., Tianmu-1) and provide a novel approach for multisource GNSS-R fusion-based SSH monitoring. This work holds important theoretical significance and practical value, particularly offering broad application prospects in global ocean monitoring and climate change research.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2103-2119"},"PeriodicalIF":5.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11299433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1109/JSTARS.2025.3644442
Yang Zhao;Jiaqi Liang;Hancheng Ma;Pingping Huang;Yifan Dong;Jing Li
Open-set domain adaptation (OSDA) aims to generalize cross-domain remote sensing scene classification by classifying unknown categories that exist in the target and not seen in the source domain. In remote sensing, the significant distribution discrepancy between two domains hinders effective knowledge transfer, which degrades the generalization performance of OSDA. In addition, the semantic similarity among different categories impacts the classification performance of both known and unknown categories. However, existing OSDA methods often neglect transferable semantic information and this limits their generalization ability. To address these issues, this article proposed a semantic-guided hierarchical consistency domain adaptation (SGHC) method to enhance semantic separability and cross-domain generalization. Specifically, an attribute guided prompt (AGP) is introduced to mine transferable semantic attributes and semantic relationships. The semantic information effectively improves fine-grained scene understanding and promotes the distinguishing of unknown categories. Then, a hierarchical consistency (HC) is employed to complement generalization in open-set scenarios. The HC retains discriminative information of categories and effectively alleviates the domain gap between the source and target domain to avoid negative transfer. To validate the proposed method's performance, experiments are conducted on six cross-domain scenarios with aerial image dataset, Northwestern Polytechnical University dataset (NWPU), and University of California Merced Land Use dataset (UCMD). Experimental results demonstrate the effectiveness of the proposed method in open-set remote sensing scene classification. Especially, the proposed method improves the overall classification accuracies by at least 5.6% on the NWPU $rightarrow$ UCMD scenario compared with the other eleven state-of-the-art methods.
{"title":"Semantic-Guided Hierarchical Consistency Domain Adaptation for Open-Set Remote Sensing Scene Classification","authors":"Yang Zhao;Jiaqi Liang;Hancheng Ma;Pingping Huang;Yifan Dong;Jing Li","doi":"10.1109/JSTARS.2025.3644442","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3644442","url":null,"abstract":"Open-set domain adaptation (OSDA) aims to generalize cross-domain remote sensing scene classification by classifying unknown categories that exist in the target and not seen in the source domain. In remote sensing, the significant distribution discrepancy between two domains hinders effective knowledge transfer, which degrades the generalization performance of OSDA. In addition, the semantic similarity among different categories impacts the classification performance of both known and unknown categories. However, existing OSDA methods often neglect transferable semantic information and this limits their generalization ability. To address these issues, this article proposed a semantic-guided hierarchical consistency domain adaptation (SGHC) method to enhance semantic separability and cross-domain generalization. Specifically, an attribute guided prompt (AGP) is introduced to mine transferable semantic attributes and semantic relationships. The semantic information effectively improves fine-grained scene understanding and promotes the distinguishing of unknown categories. Then, a hierarchical consistency (HC) is employed to complement generalization in open-set scenarios. The HC retains discriminative information of categories and effectively alleviates the domain gap between the source and target domain to avoid negative transfer. To validate the proposed method's performance, experiments are conducted on six cross-domain scenarios with aerial image dataset, Northwestern Polytechnical University dataset (NWPU), and University of California Merced Land Use dataset (UCMD). Experimental results demonstrate the effectiveness of the proposed method in open-set remote sensing scene classification. Especially, the proposed method improves the overall classification accuracies by at least 5.6% on the NWPU <inline-formula><tex-math>$rightarrow$</tex-math></inline-formula> UCMD scenario compared with the other eleven state-of-the-art methods.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2088-2102"},"PeriodicalIF":5.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300939","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the increase of multimodal image data has offered a broader prospect for multimodal semantic segmentation. However, the data heterogeneity between different modalities make it difficult to leverage complementary information and create semantic understanding deviations, which limits the fusion quality and segmentation accuracy. To overcome these challenges, we propose a hybrid attention driven CNN-Mamba multimodal fusion network (HACMNet) for semantic segmentation. It aims to fully exploit the strengths of optical images in texture and semantic representation, along with the complementary structural and elevation information from the digital surface model (DSM). This enables the effective extraction and combination of global and local complementary information to achieve higher accuracy and robustness in semantic segmentation. Specifically, we propose a progressive cross-modal feature interaction (PCMFI) mechanism in the encoder. It integrates the fine-grained textures and semantic information of optical images with the structural boundaries and spatial information of DSM, thereby facilitating more precise cross-modal feature interaction. Second, we design an adaptive dual-stream Mamba cross-modal fusion (ADMCF) module, which leverages a learnable variable mechanism to deeply represent global semantic and spatial structural information. This enhances deep semantic feature interaction and improves the ability of the model to distinguish complex land cover categories. Together, these modules progressively refine cross-modal cues and strengthen semantic interactions, enabling more coherent and discriminative multimodal fusion. Finally, we introduce a global-local feature decoder to effectively integrate the global and local information from the fused multimodal features. It preserves the structural integrity of target objects while enhancing edge detail representation, thus enhancing segmentation results. Through rigorous testing on standard datasets like ISPRS Vaihingen and Potsdam, the proposed HACMNet demonstrates advantages over prevailing methods in multimodal remote sensing analysis, particularly on challenging object classes.
{"title":"Hybrid Attention Driven CNN-Mamba Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation","authors":"Shu Tian;Minglei Li;Lin Cao;Lihong Kang;Jing Tian;Xiangwei Xing;Bo Shen;Kangning Du;Chong Fu;Ye Zhang","doi":"10.1109/JSTARS.2025.3644588","DOIUrl":"https://doi.org/10.1109/JSTARS.2025.3644588","url":null,"abstract":"In recent years, the increase of multimodal image data has offered a broader prospect for multimodal semantic segmentation. However, the data heterogeneity between different modalities make it difficult to leverage complementary information and create semantic understanding deviations, which limits the fusion quality and segmentation accuracy. To overcome these challenges, we propose a hybrid attention driven CNN-Mamba multimodal fusion network (HACMNet) for semantic segmentation. It aims to fully exploit the strengths of optical images in texture and semantic representation, along with the complementary structural and elevation information from the digital surface model (DSM). This enables the effective extraction and combination of global and local complementary information to achieve higher accuracy and robustness in semantic segmentation. Specifically, we propose a progressive cross-modal feature interaction (PCMFI) mechanism in the encoder. It integrates the fine-grained textures and semantic information of optical images with the structural boundaries and spatial information of DSM, thereby facilitating more precise cross-modal feature interaction. Second, we design an adaptive dual-stream Mamba cross-modal fusion (ADMCF) module, which leverages a learnable variable mechanism to deeply represent global semantic and spatial structural information. This enhances deep semantic feature interaction and improves the ability of the model to distinguish complex land cover categories. Together, these modules progressively refine cross-modal cues and strengthen semantic interactions, enabling more coherent and discriminative multimodal fusion. Finally, we introduce a global-local feature decoder to effectively integrate the global and local information from the fused multimodal features. It preserves the structural integrity of target objects while enhancing edge detail representation, thus enhancing segmentation results. Through rigorous testing on standard datasets like ISPRS Vaihingen and Potsdam, the proposed HACMNet demonstrates advantages over prevailing methods in multimodal remote sensing analysis, particularly on challenging object classes.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"2254-2272"},"PeriodicalIF":5.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11300934","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}