Pub Date : 2026-01-29DOI: 10.1109/JSTARS.2026.3659193
Yuanjie Zhi;Yushuo Qi;Zhi Yang;Wenkui Hao;Mingyang Ma;Shaohui Mei
Remote sensing image object detection, leveraging the complementary characteristics of infrared and RGB imaging, represents an effective approach for achieving all-weather detection. However, in complex environments, the quality of information provided by different modalities can undergo dynamic variations, necessitating dynamic adjustment of the weights assigned to each modality. Therefore, a dynamic adaptive network based on modal competition and dual-encoder feature fusion is proposed to implement precise modeling of dynamic complementary relationships and adaptive extraction of discriminative features through hierarchical feature dynamic interaction and an adaptive salient modal competition mechanism. Specifically, a hierarchical feature attention fusion module (HFAM) is designed under dual parallel feature encoding branches to enable the fusion of global context and local details, in which the cross-channel attention module is adopted to enhance channel responses through reconstruction via channel feature correlation matrices, and the difference fusion attention module (DFAM) concurrently calibrates spatial biases through pixel-level difference modeling. Moreover, an information entropy-guided adaptive modal competition mechanism is proposed to filter high-confidence queries by quantifying feature point uncertainty, thereby providing useful prior information for the decoder and adaptively determining the salient modality for targets to balance modal contributions. Experimental results over two benchmark datasets, i.e., DroneVehicle and VEDAI datasets, demonstrate that the proposed method clearly outperform state-of-the-art algorithms by effectively handling highly dynamic feature variations.
{"title":"MCDF-Net: Dynamic Adaptive Network Based on Modal Competition and Dual Encoder Feature Fusion for Remote Sensing Image Target Detection","authors":"Yuanjie Zhi;Yushuo Qi;Zhi Yang;Wenkui Hao;Mingyang Ma;Shaohui Mei","doi":"10.1109/JSTARS.2026.3659193","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3659193","url":null,"abstract":"Remote sensing image object detection, leveraging the complementary characteristics of infrared and RGB imaging, represents an effective approach for achieving all-weather detection. However, in complex environments, the quality of information provided by different modalities can undergo dynamic variations, necessitating dynamic adjustment of the weights assigned to each modality. Therefore, a dynamic adaptive network based on modal competition and dual-encoder feature fusion is proposed to implement precise modeling of dynamic complementary relationships and adaptive extraction of discriminative features through hierarchical feature dynamic interaction and an adaptive salient modal competition mechanism. Specifically, a hierarchical feature attention fusion module (HFAM) is designed under dual parallel feature encoding branches to enable the fusion of global context and local details, in which the cross-channel attention module is adopted to enhance channel responses through reconstruction via channel feature correlation matrices, and the difference fusion attention module (DFAM) concurrently calibrates spatial biases through pixel-level difference modeling. Moreover, an information entropy-guided adaptive modal competition mechanism is proposed to filter high-confidence queries by quantifying feature point uncertainty, thereby providing useful prior information for the decoder and adaptively determining the salient modality for targets to balance modal contributions. Experimental results over two benchmark datasets, i.e., DroneVehicle and VEDAI datasets, demonstrate that the proposed method clearly outperform state-of-the-art algorithms by effectively handling highly dynamic feature variations.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6581-6596"},"PeriodicalIF":5.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11367775","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Floods are one of the most common and devastating natural disasters worldwide. The contribution of remote sensing is important for reducing the impact of flooding both during the event itself and for improving hydrodynamic models by reducing their associated uncertainties. This article presents the innovative capabilities of the Surface Water and Ocean Topography (SWOT) mission, especially its river node products, to enhance the accuracy of riverine flood reanalysis, performed on a 50-km stretch of the Garonne River. The challenge addressed here is quantifying how SWOT river observations, alone and in combination with in-situ gauges, can improve hydraulic parameter estimation and river water level prediction in flood reanalysis. The experiments incorporate various data assimilation strategies, based on the ensemble Kalman filter, which allows for sequential updates of model parameters based on available observations. The experimental results show that while SWOT data alone offers some improvements, combining it with in-situ water level measurements provides the most accurate representation of flood dynamics, both at gauge stations and along the river. The study also investigates the impact of different SWOT revisit frequencies on the model’s performance, revealing that assimilating more frequent SWOT observations leads to more reliable flood reanalyses. In the real event, it was demonstrated that the assimilation of SWOT and in-situ data accurately reproduces the water level dynamics, offering promising prospects for future flood monitoring systems. Results show that in the OSSE framework, assimilation reduced water level errors by an order of magnitude, while in the real 2024 event the errors were reduced to below 17 cm, demonstrating the reliability of the approach. This study underscores the complementary role of Earth Observation data in enhancing flood dynamics representation in the riverbed and the floodplains.
{"title":"Assimilation of SWOT Altimetry Data for Riverine Flood Reanalysis: From Synthetic to Real Data","authors":"Quentin Bonassies;Thanh Huy Nguyen;Ludovic Cassan;Andrea Piacentini;Sophie Ricci;Charlotte Emery;Christophe Fatras;Santiago Peña Luque;Raquel Rodriguez Suquet","doi":"10.1109/JSTARS.2026.3659808","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3659808","url":null,"abstract":"Floods are one of the most common and devastating natural disasters worldwide. The contribution of remote sensing is important for reducing the impact of flooding both during the event itself and for improving hydrodynamic models by reducing their associated uncertainties. This article presents the innovative capabilities of the Surface Water and Ocean Topography (SWOT) mission, especially its river node products, to enhance the accuracy of riverine flood reanalysis, performed on a 50-km stretch of the Garonne River. The challenge addressed here is quantifying how SWOT river observations, alone and in combination with in-situ gauges, can improve hydraulic parameter estimation and river water level prediction in flood reanalysis. The experiments incorporate various data assimilation strategies, based on the ensemble Kalman filter, which allows for sequential updates of model parameters based on available observations. The experimental results show that while SWOT data alone offers some improvements, combining it with in-situ water level measurements provides the most accurate representation of flood dynamics, both at gauge stations and along the river. The study also investigates the impact of different SWOT revisit frequencies on the model’s performance, revealing that assimilating more frequent SWOT observations leads to more reliable flood reanalyses. In the real event, it was demonstrated that the assimilation of SWOT and in-situ data accurately reproduces the water level dynamics, offering promising prospects for future flood monitoring systems. Results show that in the OSSE framework, assimilation reduced water level errors by an order of magnitude, while in the real 2024 event the errors were reduced to below 17 cm, demonstrating the reliability of the approach. This study underscores the complementary role of Earth Observation data in enhancing flood dynamics representation in the riverbed and the floodplains.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"7024-7041"},"PeriodicalIF":5.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11368722","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The back projection (BP) algorithm has become an important method for achieving high-resolution synthetic aperture radar imaging due to its model-free assumptions, high imaging accuracy, and strong trajectory adaptability. However, its high computational complexity severely limits real-time performance and system scalability. To address the challenge of large-size, high-accuracy, real-time imaging on resource-constrained system-on-chip, this article proposes an efficient BP acceleration architecture based on truncated sinc interpolation reconstruction that effectively eliminates the limitation of off-chip memory bandwidth on system performance, and significantly reduces on-chip memory and logic resource consumption. A mixed precision strategy is proposed, reducing the lookup table consumption by 46.84% compared to the traditional floating-point implementation, while maintaining nearly the same imaging accuracy. The proposed system achieves real-time SAR imaging on both uncrewed and manned aerial vehicles: on a low-speed uncrewed vehicle, it completes a $text{4096} times text{3840}$ image in 2.5968 s, and on a high-speed manned vehicle, it completes an $text{8192} times text{4096}$ image in 15.3557 s, which meets real-time processing requirements. The peak signal-to-noise ratio of the imaging result improves nearly seven times compared to most existing FPGA implementations with lower resource consumption, while achieving faster processing speeds. Experimental results demonstrate that the proposed scheme, by significantly improving resource utilization efficiency and imaging speed, achieves real-time processing capabilities for high-accuracy and large-size SAR imaging tasks, thereby exhibiting excellent practicality and scalability.
BP算法具有无模型假设、成像精度高、弹道适应性强等优点,已成为实现高分辨率合成孔径雷达成像的重要方法。然而,它的高计算复杂度严重限制了系统的实时性和可扩展性。为了解决在资源受限的片上系统上实现大尺寸、高精度、实时成像的挑战,本文提出了一种基于截断自插值重构的高效BP加速架构,有效消除了片外存储器带宽对系统性能的限制,显著降低了片上存储器和逻辑资源的消耗。提出了一种混合精度策略,与传统浮点实现相比,查找表消耗减少46.84%,同时保持几乎相同的成像精度。该系统实现了无人飞行器和有人飞行器的实时SAR成像,在低速无人飞行器上,在2.5968秒内完成一张$text{4096} times text{3840}$图像,在高速有人飞行器上,在15.3557秒内完成一张$text{8192} times text{4096}$图像,满足实时性处理要求。与大多数现有FPGA实现相比,成像结果的峰值信噪比提高了近7倍,资源消耗更低,同时实现了更快的处理速度。实验结果表明,该方案显著提高了资源利用效率和成像速度,实现了高精度、大尺寸SAR成像任务的实时处理能力,具有良好的实用性和可扩展性。
{"title":"A Real-Time MPSoC-Based Back Projection Accelerator for High-Accuracy Large-Size SAR Imaging Using Truncated Sinc Reconstruction and Mixed Precision Design","authors":"Xinyu Hu;Yinshen Wang;Jiabao Guo;Yao Cheng;Qiancheng Yan;Jiangyu Yao;Xiaolan Qiu","doi":"10.1109/JSTARS.2026.3658956","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3658956","url":null,"abstract":"The back projection (BP) algorithm has become an important method for achieving high-resolution synthetic aperture radar imaging due to its model-free assumptions, high imaging accuracy, and strong trajectory adaptability. However, its high computational complexity severely limits real-time performance and system scalability. To address the challenge of large-size, high-accuracy, real-time imaging on resource-constrained system-on-chip, this article proposes an efficient BP acceleration architecture based on truncated sinc interpolation reconstruction that effectively eliminates the limitation of off-chip memory bandwidth on system performance, and significantly reduces on-chip memory and logic resource consumption. A mixed precision strategy is proposed, reducing the lookup table consumption by 46.84% compared to the traditional floating-point implementation, while maintaining nearly the same imaging accuracy. The proposed system achieves real-time SAR imaging on both uncrewed and manned aerial vehicles: on a low-speed uncrewed vehicle, it completes a <inline-formula><tex-math>$text{4096} times text{3840}$</tex-math></inline-formula> image in 2.5968 s, and on a high-speed manned vehicle, it completes an <inline-formula><tex-math>$text{8192} times text{4096}$</tex-math></inline-formula> image in 15.3557 s, which meets real-time processing requirements. The peak signal-to-noise ratio of the imaging result improves nearly seven times compared to most existing FPGA implementations with lower resource consumption, while achieving faster processing speeds. Experimental results demonstrate that the proposed scheme, by significantly improving resource utilization efficiency and imaging speed, achieves real-time processing capabilities for high-accuracy and large-size SAR imaging tasks, thereby exhibiting excellent practicality and scalability.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6597-6613"},"PeriodicalIF":5.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11367768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1109/JSTARS.2026.3659080
Ning Lv;Jisheng Dang;Teng Wang;Bimei Wang;Yichu Liu;Hong Peng;Haowen Yan;Bin Hu
Recent research has actively explored diverse mechanisms to unlock pixel-level segmentation capabilities in multimodal large language models (MLLMs), aiming to bridge the gap between high-level semantic reasoning and fine-grained visual perception. However, directly transferring these general-domain frameworks to referring remote sensing image segmentation (RRSIS) faces significant hurdles. These challenges primarily stem from the weak pixel-level discrimination capability of MLLMs in complex geospatial scenes and the severe granularity mismatch caused by drastic scale variations in remote sensing targets. To overcome these limitations, this article proposes ReSaP, a reasoning-enhanced and scale-aware prompting framework. ReSaP incorporates two core components to effectively adapt MLLMs for pixel-wise tasks. First, we introduce a pixel-aware group relative policy optimization (GRPO) training scheme. By utilizing a reinforcement learning framework with a hybrid reward mechanism that integrates bipartite matching for localization and classification accuracy for verification, this scheme explicitly enhances the MLLM’s fine-grained pixel discrimination and localization precision. Second, we propose the scale-aware prompting strategy for inference. This mechanism employs a density-adaptive grid sampling approach to dynamically adjust the prompt configuration based on target dimensions, effectively harmonizing prompt granularity with object scale. Extensive experiments on the RRSIS-D and RIS-LAD benchmarks demonstrate that ReSaP significantly outperforms existing state-of-the-art methods, validating its superior performance and robustness across both satellite and unmanned aerial vehicle observation perspectives.
{"title":"ReSaP: Reasoning-Enhanced and Scale-Aware Prompting for Referring Remote Sensing Image Segmentation","authors":"Ning Lv;Jisheng Dang;Teng Wang;Bimei Wang;Yichu Liu;Hong Peng;Haowen Yan;Bin Hu","doi":"10.1109/JSTARS.2026.3659080","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3659080","url":null,"abstract":"Recent research has actively explored diverse mechanisms to unlock pixel-level segmentation capabilities in multimodal large language models (MLLMs), aiming to bridge the gap between high-level semantic reasoning and fine-grained visual perception. However, directly transferring these general-domain frameworks to referring remote sensing image segmentation (RRSIS) faces significant hurdles. These challenges primarily stem from the weak pixel-level discrimination capability of MLLMs in complex geospatial scenes and the severe granularity mismatch caused by drastic scale variations in remote sensing targets. To overcome these limitations, this article proposes ReSaP, a reasoning-enhanced and scale-aware prompting framework. ReSaP incorporates two core components to effectively adapt MLLMs for pixel-wise tasks. First, we introduce a pixel-aware group relative policy optimization (GRPO) training scheme. By utilizing a reinforcement learning framework with a hybrid reward mechanism that integrates bipartite matching for localization and classification accuracy for verification, this scheme explicitly enhances the MLLM’s fine-grained pixel discrimination and localization precision. Second, we propose the scale-aware prompting strategy for inference. This mechanism employs a density-adaptive grid sampling approach to dynamically adjust the prompt configuration based on target dimensions, effectively harmonizing prompt granularity with object scale. Extensive experiments on the RRSIS-D and RIS-LAD benchmarks demonstrate that ReSaP significantly outperforms existing state-of-the-art methods, validating its superior performance and robustness across both satellite and unmanned aerial vehicle observation perspectives.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"8372-8383"},"PeriodicalIF":5.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11367764","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Infrared small-target detection (IRSTD) holds a critical role in low-visibility and long-distance imaging scenarios, such as UAV tracking and maritime surveillance. However, cluster-IRSTD (CIRSTD) faces more prominent challenges: adjacent targets are prone to feature coupling, dim targets are easily submerged by background clutter, and cluster shapes vary dynamically. Owing to the constraint of independent single-target modeling, current deep-learning methods struggle to effectively handle dense cluster scenarios. Inspired by the human top-down visual attention mechanism, this paper proposes a coarse-to-fine cascaded detection network. First, an adaptive regional attention mechanism is tailored specifically for clusters, and a coarse cluster extraction module is further designed to extract the overall features of clusters. Subsequently, the Inner Fine Distinction module seamlessly integrates the Gaussian and Scharr filters from model-driven approaches into the deep-learning framework, aiming to amplify the saliency of dim targets. It effectively solves the problems of dim target missed detection and adjacent target coupling in clusters. By synergistically integrating holistic cluster information and enhancing target saliency, the proposed Coarse-to-Fine Cascade IRSTD (C2IRSTD) significantly mitigates missed detections within clusters and reduces false alarms outside clusters. The experiments conducted on the DenseSIRST dataset have strongly demonstrated the superior performance of C2IRSTD in highly challenging dense-cluster scenarios. Meanwhile, its leading performance on the SIRST3 dataset in sparse scenarios fully highlights its excellent generalization ability.
{"title":"Top-Down Coarse-to-Fine Cascade Network for High-Precision Cluster Infrared Small Target Detection","authors":"Tuntun Wang;Jincheng Zhou;Lang Wu;Shuai Yuan;Yuxin Jing","doi":"10.1109/JSTARS.2026.3659652","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3659652","url":null,"abstract":"Infrared small-target detection (IRSTD) holds a critical role in low-visibility and long-distance imaging scenarios, such as UAV tracking and maritime surveillance. However, cluster-IRSTD (CIRSTD) faces more prominent challenges: adjacent targets are prone to feature coupling, dim targets are easily submerged by background clutter, and cluster shapes vary dynamically. Owing to the constraint of independent single-target modeling, current deep-learning methods struggle to effectively handle dense cluster scenarios. Inspired by the human top-down visual attention mechanism, this paper proposes a coarse-to-fine cascaded detection network. First, an adaptive regional attention mechanism is tailored specifically for clusters, and a coarse cluster extraction module is further designed to extract the overall features of clusters. Subsequently, the Inner Fine Distinction module seamlessly integrates the Gaussian and Scharr filters from model-driven approaches into the deep-learning framework, aiming to amplify the saliency of dim targets. It effectively solves the problems of dim target missed detection and adjacent target coupling in clusters. By synergistically integrating holistic cluster information and enhancing target saliency, the proposed Coarse-to-Fine Cascade IRSTD (C2IRSTD) significantly mitigates missed detections within clusters and reduces false alarms outside clusters. The experiments conducted on the DenseSIRST dataset have strongly demonstrated the superior performance of C2IRSTD in highly challenging dense-cluster scenarios. Meanwhile, its leading performance on the SIRST3 dataset in sparse scenarios fully highlights its excellent generalization ability.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6566-6580"},"PeriodicalIF":5.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11368750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Potato is an important staple crop both in China and worldwide, playing a critical role in ensuring global food security. Accurate mapping of the potato distribution is essential for detecting planting areas, estimating crop yields, and optimizing planting structures, thereby supporting sustainable agricultural development. However, remote sensing techniques for mapping potato distribution are still in their infancy, as most attention has been focused on the three major crops—maize, wheat, and rice. Consequently, this article proposed a cropland field-parcel-scale methodology for mapping potato distribution in Siziwang Banner, Inner Mongolia Autonomous Region, China. This methodology integrates edge detection, image segmentation, and machine-learning algorithm, leveraging multitemporal Sentinel-2 imagery to achieve accurately and effectively map the potato distribution. The results of detected edge from the four 10-m resolution Sentinel-2 bands (blue, green, red, and near infrared band) revealed that Canny edge detection can provide more sufficient information for edge extraction than Sobel edge detection. The extracted edges of the Canny edge detection algorithm are more closed and complete than the others, which is extremely important for accurate image segmentation. A comprehensive and robust edge map was generated by applying a weighted aggregation method to the edges initially extracted from each of the four spectral bands. Subsequently, the watershed segmentation algorithm was applied to these aggregated edges to delineate field parcels and index thresholds used to differentiate the cultivated field parcels and noncultivated field parcels. The methodology achieved an overall accuracy of 85% and an intersection-over-union ratio of 0.82. Finally, a random forest classifier was employed to map potato distribution by integrating spectral and index features at the field-parcel scale, achieving an overall mapping accuracy of 80% . The producer’s accuracy and user’s accuracy for potato mapping were 93.3% and 81.6%, respectively. As such, this article delivers a significant methodological support for mapping the fourth staple crop. The framework established here serves as a critical baseline for advancing crop type mapping, precise parcel extraction, and yield estimation, thereby supporting more strategic decision-making in potato cultivation and harvest.
{"title":"A Field Parcel Scale Algorithm for Mapping Potato Distribution Using Multitemporal Sentinel-2 Images","authors":"Hasituya;Feng Quan;Chen Zhongxin;Battsetseg Tuvdendorj;Altantuya Dorjsuren;Yan Zhiyuan","doi":"10.1109/JSTARS.2026.3654208","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3654208","url":null,"abstract":"Potato is an important staple crop both in China and worldwide, playing a critical role in ensuring global food security. Accurate mapping of the potato distribution is essential for detecting planting areas, estimating crop yields, and optimizing planting structures, thereby supporting sustainable agricultural development. However, remote sensing techniques for mapping potato distribution are still in their infancy, as most attention has been focused on the three major crops—maize, wheat, and rice. Consequently, this article proposed a cropland field-parcel-scale methodology for mapping potato distribution in Siziwang Banner, Inner Mongolia Autonomous Region, China. This methodology integrates edge detection, image segmentation, and machine-learning algorithm, leveraging multitemporal Sentinel-2 imagery to achieve accurately and effectively map the potato distribution. The results of detected edge from the four 10-m resolution Sentinel-2 bands (blue, green, red, and near infrared band) revealed that Canny edge detection can provide more sufficient information for edge extraction than Sobel edge detection. The extracted edges of the Canny edge detection algorithm are more closed and complete than the others, which is extremely important for accurate image segmentation. A comprehensive and robust edge map was generated by applying a weighted aggregation method to the edges initially extracted from each of the four spectral bands. Subsequently, the watershed segmentation algorithm was applied to these aggregated edges to delineate field parcels and index thresholds used to differentiate the cultivated field parcels and noncultivated field parcels. The methodology achieved an overall accuracy of 85% and an intersection-over-union ratio of 0.82. Finally, a random forest classifier was employed to map potato distribution by integrating spectral and index features at the field-parcel scale, achieving an overall mapping accuracy of 80% . The producer’s accuracy and user’s accuracy for potato mapping were 93.3% and 81.6%, respectively. As such, this article delivers a significant methodological support for mapping the fourth staple crop. The framework established here serves as a critical baseline for advancing crop type mapping, precise parcel extraction, and yield estimation, thereby supporting more strategic decision-making in potato cultivation and harvest.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6534-6545"},"PeriodicalIF":5.3,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11364529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-28DOI: 10.1109/JSTARS.2026.3657648
Xiaoqing Wan;Dongtao Mo;Yupeng He;Feng Chen;Zhize Li
Hyperspectral image (HSI) classification necessitates adept modeling of both intricate local variations and long-range spectral–spatial dependencies, while maintaining computational efficiency. Conventional methods frequently prioritize on either local or global features, neglecting directional information, or employ simplistic fusion techniques, which leads to inadequate feature representations and reduced discriminative ability. To address these challenges, this article presents MD2F-Mamba, a novel dual-branch architecture that integrates a multidirectional depthwise convolution module to capture spatial features from multiple orientations—namely, square, horizontal, and vertical convolutions, enriching local representations. The architecture comprises a local branch, featuring a multiscale local feature enhancement module with positional encoding, which effectively captures diverse spatial–spectral patterns. Simultaneously, the global branch utilizes a hierarchical state-space Mamba for global feature extraction that models multiscale long-range dependencies with linear complexity. A cosine similarity feature fusion module adaptively merges local and global features, optimizing discriminability by reducing redundancy. Experimental results on the Pavia University, Houston2013, WHU-Hi-LongKou, and WHU-Hi-HanChuan datasets demonstrate that MD2F-Mamba achieves competitive classification results while maintaining a minimal parameter count compared with several state-of-the-art methods, underscoring its innovative efficiency and robustness in HSI classification.
{"title":"MD2F-Mamba: Multidirectional Depthwise Convolution and Dual-Branch Mamba Feature Fusion Networks for Hyperspectral Image Classification","authors":"Xiaoqing Wan;Dongtao Mo;Yupeng He;Feng Chen;Zhize Li","doi":"10.1109/JSTARS.2026.3657648","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3657648","url":null,"abstract":"Hyperspectral image (HSI) classification necessitates adept modeling of both intricate local variations and long-range spectral–spatial dependencies, while maintaining computational efficiency. Conventional methods frequently prioritize on either local or global features, neglecting directional information, or employ simplistic fusion techniques, which leads to inadequate feature representations and reduced discriminative ability. To address these challenges, this article presents MD<sup>2</sup>F-Mamba, a novel dual-branch architecture that integrates a multidirectional depthwise convolution module to capture spatial features from multiple orientations—namely, square, horizontal, and vertical convolutions, enriching local representations. The architecture comprises a local branch, featuring a multiscale local feature enhancement module with positional encoding, which effectively captures diverse spatial–spectral patterns. Simultaneously, the global branch utilizes a hierarchical state-space Mamba for global feature extraction that models multiscale long-range dependencies with linear complexity. A cosine similarity feature fusion module adaptively merges local and global features, optimizing discriminability by reducing redundancy. Experimental results on the Pavia University, Houston2013, WHU-Hi-LongKou, and WHU-Hi-HanChuan datasets demonstrate that MD<sup>2</sup>F-Mamba achieves competitive classification results while maintaining a minimal parameter count compared with several state-of-the-art methods, underscoring its innovative efficiency and robustness in HSI classification.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6214-6238"},"PeriodicalIF":5.3,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11363424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Determining the rotation axis of small bodies during the approach phase is essential for both mission operations and scientific investigations. Estimating the axis from the motion trajectories of image features has proven effective, but challenges remain due to limited image availability, weak surface textures, and uncertain observation geometries. In particular, tracking errors, unreliable trajectories, and dependence on accurately known rotation periods reduce the robustness and efficiency of existing methods. To address these challenges, this study proposes a rotation-axis estimation method for small bodies during the approach phase, based on image feature tracking and trajectory selection. The method employs sparse optical flow to extract feature trajectories and removes unstable tracks using image masks and bidirectional flow. An adaptive trajectory selection and shape classification are then performed based on the statistical distribution of fitted parameters using the histogram. Finally, a geometry-based optimization model identifies the correct rotation axis solution via a genetic algorithm, without requiring prior knowledge of the rotation period. The proposed algorithm was tested on over 400 simulated cases considering varying sun phase angles, approach angles, image numbers per rotation period, and small body shapes. The results demonstrate that the proposed method significantly outperforms the existing algorithms. The proposed algorithm achieved estimation errors below 3° in 89% of the cases and below 5° in 92% of the cases, and the running time of all the cases was less than 3 min. Validation using in-orbit data from the OSIRIS-REx mission confirmed that the proposed algorithm can estimate the rotation axis of asteroid Bennu with an error of only 2.69°. The results validate the proposed algorithm's effectiveness and efficiency, proving its potential for small body exploration missions.
{"title":"A Feature Tracking and Trajectory Selection Based Rotation Axis Estimation Method for Small Bodies Using Optical Remote Sensing Images From the Approach Phase","authors":"Yifan Wang;Huan Xie;Xiongfeng Yan;Jie Chen;Yaqiong Wang;Taoze Ying;Ming Yang;Xiaohua Tong","doi":"10.1109/JSTARS.2026.3658924","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3658924","url":null,"abstract":"Determining the rotation axis of small bodies during the approach phase is essential for both mission operations and scientific investigations. Estimating the axis from the motion trajectories of image features has proven effective, but challenges remain due to limited image availability, weak surface textures, and uncertain observation geometries. In particular, tracking errors, unreliable trajectories, and dependence on accurately known rotation periods reduce the robustness and efficiency of existing methods. To address these challenges, this study proposes a rotation-axis estimation method for small bodies during the approach phase, based on image feature tracking and trajectory selection. The method employs sparse optical flow to extract feature trajectories and removes unstable tracks using image masks and bidirectional flow. An adaptive trajectory selection and shape classification are then performed based on the statistical distribution of fitted parameters using the histogram. Finally, a geometry-based optimization model identifies the correct rotation axis solution via a genetic algorithm, without requiring prior knowledge of the rotation period. The proposed algorithm was tested on over 400 simulated cases considering varying sun phase angles, approach angles, image numbers per rotation period, and small body shapes. The results demonstrate that the proposed method significantly outperforms the existing algorithms. The proposed algorithm achieved estimation errors below 3° in 89% of the cases and below 5° in 92% of the cases, and the running time of all the cases was less than 3 min. Validation using in-orbit data from the OSIRIS-REx mission confirmed that the proposed algorithm can estimate the rotation axis of asteroid Bennu with an error of only 2.69°. The results validate the proposed algorithm's effectiveness and efficiency, proving its potential for small body exploration missions.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6477-6496"},"PeriodicalIF":5.3,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11367265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-28DOI: 10.1109/JSTARS.2026.3658505
Xinpei Han;Xiaotong Zhang;Lingfeng Lu;Lingchen Bu;Run Jia;Bo Jiang;Yunjun Yao
Surface downward shortwave radiation (Rs) is fundamental for modeling surface energy budgets and biogeochemical cycles. Although much effort on Rs estimation has been conducted, retrievals of its direct and diffuse components remain limited. This study developed a framework integrating machine learning with physical modeling to retrieve Rs, and its direct (Rdirect) and diffuse (Rdiffuse) components at 4-km spatial resolution over China using satellite observations from the Fengyun-4A AGRI. The proposed model derives the instantaneous estimates of Rs and its direct and diffuse components using traditional physical models. These initial estimates, along with cloud, water, and ERA5 Rs data, served as a feature set to obtain accurate radiation estimates based on the random forest model. The model-estimated daily mean Rs was validated against ground measurements from Climate Data Center of the Chinese Meteorological Administration (CDC/CMA), yielding an R of 0.98, a mean bias error (MBE) of –0.01 W/m2, and an root mean square error (RMSE) of 17.13 W/m2. For the daily mean Rdirect (Rdiffuse), validation against National Ecosystem Science Data Center stations yielded an R of 0.98 (0.98), an MBE of 12.59 (–37.01) W/m2, and an RMSE of 24.11 (42.84) W/m2, respectively. The generated Rs and its direct and diffuse components were also compared with existing products. The spatial distribution of the derived estimates is consistent with other products, but with relatively higher spatial resolution and precision at the selected sites. The proposed method has the advantage of using new-generation geostationary satellites by combining the strengths of physical models and machine learning to advance radiation estimation research.
{"title":"Quantifying Surface Downward Shortwave Radiation and Its Direct and Diffuse Components Using Fengyun-4A AGRI Observations","authors":"Xinpei Han;Xiaotong Zhang;Lingfeng Lu;Lingchen Bu;Run Jia;Bo Jiang;Yunjun Yao","doi":"10.1109/JSTARS.2026.3658505","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3658505","url":null,"abstract":"Surface downward shortwave radiation (<italic>R</i><sub>s</sub>) is fundamental for modeling surface energy budgets and biogeochemical cycles. Although much effort on <italic>R</i><sub>s</sub> estimation has been conducted, retrievals of its direct and diffuse components remain limited. This study developed a framework integrating machine learning with physical modeling to retrieve <italic>R</i><sub>s</sub>, and its direct (<italic>R</i><sub>direct</sub>) and diffuse (<italic>R</i><sub>diffuse</sub>) components at 4-km spatial resolution over China using satellite observations from the Fengyun-4A AGRI. The proposed model derives the instantaneous estimates of <italic>R</i><sub>s</sub> and its direct and diffuse components using traditional physical models. These initial estimates, along with cloud, water, and ERA5 <italic>R</i><sub>s</sub> data, served as a feature set to obtain accurate radiation estimates based on the random forest model. The model-estimated daily mean <italic>R</i><sub>s</sub> was validated against ground measurements from Climate Data Center of the Chinese Meteorological Administration (CDC/CMA), yielding an <italic>R</i> of 0.98, a mean bias error (MBE) of –0.01 W/m<sup>2</sup>, and an root mean square error (RMSE) of 17.13 W/m<sup>2</sup>. For the daily mean <italic>R</i><sub>direct</sub> (<italic>R</i><sub>diffuse</sub>), validation against National Ecosystem Science Data Center stations yielded an <italic>R</i> of 0.98 (0.98), an MBE of 12.59 (–37.01) W/m<sup>2</sup>, and an RMSE of 24.11 (42.84) W/m<sup>2</sup>, respectively. The generated <italic>R</i><sub>s</sub> and its direct and diffuse components were also compared with existing products. The spatial distribution of the derived estimates is consistent with other products, but with relatively higher spatial resolution and precision at the selected sites. The proposed method has the advantage of using new-generation geostationary satellites by combining the strengths of physical models and machine learning to advance radiation estimation research.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"6359-6374"},"PeriodicalIF":5.3,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11367295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1109/JSTARS.2026.3658442
Yingrui Ji;Chenhao Wang;Jiansheng Chen;Jingbo Chen;Anzhi Yue;Yu Meng;Chenhong Sui
Open-vocabulary segmentation (OVS) allows models to segment any categories based on text prompts, overcoming the limitations of traditional closed-set methods. Open-vocabulary segmentation (OVS) in remote sensing is limited by its reliance on standard red, green, and blue (RGB) images. This prevents models from using valuable information from other spectral bands, such as near-infrared (NIR). This article proposes MovSeg, a framework for multispectral open-vocabulary segmentation. Our method efficiently integrates four-band (RGB+NIR) data into a pretrained vision-language model. MovSeg introduces two components: a multispectral input adaptation (MIA) module and a spatial–channel adaptive tuning (SCAT) strategy. The MIA module adapts the input layer to process four-band data. It uses a weight-copying strategy for initialization and a window efficient channel attention mechanism to reweight spectral channels. The SCAT strategy is a parameter-efficient strategy that uses a hybrid of low-rank adaptation and adapters to fine-tune deep features with low computational cost. Experiments on three multispectral datasets show that MovSeg outperforms existing general-domain and remote sensing OVS methods. The model achieves significant gains on NIR-sensitive classes, confirming its ability to exploit the extra spectral data. The codes will be coming soon.
{"title":"MovSeg: Efficient Adaptation of Vision–Language Models for Multispectral Open- Vocabulary Segmentation","authors":"Yingrui Ji;Chenhao Wang;Jiansheng Chen;Jingbo Chen;Anzhi Yue;Yu Meng;Chenhong Sui","doi":"10.1109/JSTARS.2026.3658442","DOIUrl":"https://doi.org/10.1109/JSTARS.2026.3658442","url":null,"abstract":"Open-vocabulary segmentation (OVS) allows models to segment any categories based on text prompts, overcoming the limitations of traditional closed-set methods. Open-vocabulary segmentation (OVS) in remote sensing is limited by its reliance on standard red, green, and blue (RGB) images. This prevents models from using valuable information from other spectral bands, such as near-infrared (NIR). This article proposes MovSeg, a framework for multispectral open-vocabulary segmentation. Our method efficiently integrates four-band (RGB+NIR) data into a pretrained vision-language model. MovSeg introduces two components: a multispectral input adaptation (MIA) module and a spatial–channel adaptive tuning (SCAT) strategy. The MIA module adapts the input layer to process four-band data. It uses a weight-copying strategy for initialization and a window efficient channel attention mechanism to reweight spectral channels. The SCAT strategy is a parameter-efficient strategy that uses a hybrid of low-rank adaptation and adapters to fine-tune deep features with low computational cost. Experiments on three multispectral datasets show that MovSeg outperforms existing general-domain and remote sensing OVS methods. The model achieves significant gains on NIR-sensitive classes, confirming its ability to exploit the extra spectral data. The codes will be coming soon.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"19 ","pages":"8044-8055"},"PeriodicalIF":5.3,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11365526","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}