Pub Date : 2025-11-12DOI: 10.1109/LGRS.2025.3632153
Zhaoyu Liu;Wei Chen;Lixia Yang
To address core challenges in synthetic aperture radar (SAR) image target detection, including complex background interference, weak small-target features, and multiscale target coexistence, this study proposes the synthetic aperture-optimized real-time detection transformer (SA-RTDETR) model. The framework incorporates three core modules to enhance detection efficacy. First, the bidirectional receptive field boosting module synergistically integrates local details with global contextual information and substantially improves discriminative feature extraction while preserving spatial resolution. Second, the deformable attention-based intrascale feature interaction module employs adaptive sampling of critical scattering regions to address localization difficulties of small targets in SAR imagery. Third, the attention upsampling module mitigates detail loss and aliasing artifacts inherent in traditional interpolation methods through feature compensation strategies. Experimental results on the SARDet-100K dataset demonstrate that SA-RTDETR achieves 90.1% mAP@50, 56.0% mAP@50-95, and 84.7% recall rate representing improvements of 2.7%, 2.6%, and 2.2% over the baseline model, respectively. The end-to-end architecture enables high-precision SAR image analysis and offers considerable potential for military reconnaissance and maritime surveillance applications. The SA-RTDETR model establishes a novel technical paradigm for reliable all-weather remote sensing target detection by harmonizing feature robustness, scale adaptability, and operational efficiency.
{"title":"SA-RTDETR: A High-Precision Real-Time Detection Transformer Based on Complex Scenarios for SAR Object Detection","authors":"Zhaoyu Liu;Wei Chen;Lixia Yang","doi":"10.1109/LGRS.2025.3632153","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3632153","url":null,"abstract":"To address core challenges in synthetic aperture radar (SAR) image target detection, including complex background interference, weak small-target features, and multiscale target coexistence, this study proposes the synthetic aperture-optimized real-time detection transformer (SA-RTDETR) model. The framework incorporates three core modules to enhance detection efficacy. First, the bidirectional receptive field boosting module synergistically integrates local details with global contextual information and substantially improves discriminative feature extraction while preserving spatial resolution. Second, the deformable attention-based intrascale feature interaction module employs adaptive sampling of critical scattering regions to address localization difficulties of small targets in SAR imagery. Third, the attention upsampling module mitigates detail loss and aliasing artifacts inherent in traditional interpolation methods through feature compensation strategies. Experimental results on the SARDet-100K dataset demonstrate that SA-RTDETR achieves 90.1% mAP@50, 56.0% mAP@50-95, and 84.7% recall rate representing improvements of 2.7%, 2.6%, and 2.2% over the baseline model, respectively. The end-to-end architecture enables high-precision SAR image analysis and offers considerable potential for military reconnaissance and maritime surveillance applications. The SA-RTDETR model establishes a novel technical paradigm for reliable all-weather remote sensing target detection by harmonizing feature robustness, scale adaptability, and operational efficiency.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/LGRS.2025.3631806
Haoxuan Xu;Meiguo Gao
Marine radar is widely employed in ocean monitoring systems. However, sea clutter significantly impairs radar data interpretability and degrades maritime target detection performance. Effective clutter suppression methods are thus essential to enhance target characteristics for improved detection. However, environmental sea clutter often exhibits complex statistical characteristics, causing traditional model-based methods to suffer from performance degradation. To address this challenge, this letter proposes a sea clutter suppression method based on a complex-valued neural network (CVNN). First, the network incorporates a wavelet convolution (WTConv) block to expand the receptive field. Second, complex-valued convolutional blocks integrated with an attention mechanism are designed to enhance latent feature extraction. Finally, the model’s performance is rigorously validated using real-measured data. Experimental results demonstrate that the proposed model achieves superior clutter suppression performance.
{"title":"An End-to-End Sea Clutter Suppression Method Using Wavelet Convolution-Enhanced Attentional Complex-Valued Neural Network","authors":"Haoxuan Xu;Meiguo Gao","doi":"10.1109/LGRS.2025.3631806","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3631806","url":null,"abstract":"Marine radar is widely employed in ocean monitoring systems. However, sea clutter significantly impairs radar data interpretability and degrades maritime target detection performance. Effective clutter suppression methods are thus essential to enhance target characteristics for improved detection. However, environmental sea clutter often exhibits complex statistical characteristics, causing traditional model-based methods to suffer from performance degradation. To address this challenge, this letter proposes a sea clutter suppression method based on a complex-valued neural network (CVNN). First, the network incorporates a wavelet convolution (WTConv) block to expand the receptive field. Second, complex-valued convolutional blocks integrated with an attention mechanism are designed to enhance latent feature extraction. Finally, the model’s performance is rigorously validated using real-measured data. Experimental results demonstrate that the proposed model achieves superior clutter suppression performance.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/LGRS.2025.3631871
Haotian Li;Jiaqi Ma;Wenna Guo;Xiaoxia Li;Xiaohui Qin;Zhenhua Ma
With the rapid development of applications such as unmanned aerial vehicle (UAV)-based remote sensing, smart cities, and intelligent transportation, small-object detection has become increasingly important in the field of object recognition. However, existing methods often struggle to balance detection accuracy and inference efficiency under large-scale variations, dense small-object distributions, and complex background interference. To address these challenges, this letter proposes a lightweight perception subnetwork, RSNet-Lite. The network integrates a multiscale attention mechanism to enhance small-object perception, dynamic convolution, and long-range spatial modeling units to improve feature representation, and lightweight convolution with efficient sampling strategies to significantly reduce computational complexity. As a result, RSNet-Lite achieves real-time inference while maintaining high detection accuracy, striking a balance between speed and performance. Finally, the proposed method is validated on the Aerial Image–Tiny Object Detection (AI-TOD) and Vision Meets Drone (VisDrone) datasets, demonstrating its effectiveness and strong potential for small-object detection tasks.
{"title":"RSNet-Lite: A Lightweight Perception Subnetwork for Remote Sensing Object Detection","authors":"Haotian Li;Jiaqi Ma;Wenna Guo;Xiaoxia Li;Xiaohui Qin;Zhenhua Ma","doi":"10.1109/LGRS.2025.3631871","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3631871","url":null,"abstract":"With the rapid development of applications such as unmanned aerial vehicle (UAV)-based remote sensing, smart cities, and intelligent transportation, small-object detection has become increasingly important in the field of object recognition. However, existing methods often struggle to balance detection accuracy and inference efficiency under large-scale variations, dense small-object distributions, and complex background interference. To address these challenges, this letter proposes a lightweight perception subnetwork, RSNet-Lite. The network integrates a multiscale attention mechanism to enhance small-object perception, dynamic convolution, and long-range spatial modeling units to improve feature representation, and lightweight convolution with efficient sampling strategies to significantly reduce computational complexity. As a result, RSNet-Lite achieves real-time inference while maintaining high detection accuracy, striking a balance between speed and performance. Finally, the proposed method is validated on the Aerial Image–Tiny Object Detection (AI-TOD) and Vision Meets Drone (VisDrone) datasets, demonstrating its effectiveness and strong potential for small-object detection tasks.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/LGRS.2025.3631867
Shangshang Zhang;Yulong Fan;Lin Sun
Accurate retrieval of the spatiotemporal distribution of atmospheric aerosols is essential for studying aerosolradiationcloud interactions, air-quality forecasting, and climate-change assessment. Although data-driven methods have significantly advanced aerosol retrieval, the existing models often neglect the influence of aerosol type on retrieval accuracy. To address this gap, this study presents an improved data-driven aerosol retrieval framework that explicitly incorporates aerosol type information into model training. Aerosol classification is performed using the $K$ -means unsupervised clustering algorithm to optimize training samples, thereby enhancing model adaptability and retrieval accuracy. The refined samples are then used to train an extremely randomized trees (ERTs) model, achieving an optimal balance between accuracy and computational efficiency. Validation results demonstrate strong performance, with a correlation coefficient of 0.93, a root mean square error (RMSE) of 0.072, and over 89% of results falling within the expected error range [(EE: ± (0.05+20% $times $ in situ observations)], better than that of the traditional model. The findings demonstrate that integrating aerosol-type information into data-driven retrievals substantially improves accuracy and applicability for aerosol remote sensing. Future research should focus on refining aerosol classification techniques and integrating multisource remote sensing data to enhance model robustness and global applicability further.
{"title":"K-Means Clustering for Improved Data-Driven Satellite Aerosol Retrieval","authors":"Shangshang Zhang;Yulong Fan;Lin Sun","doi":"10.1109/LGRS.2025.3631867","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3631867","url":null,"abstract":"Accurate retrieval of the spatiotemporal distribution of atmospheric aerosols is essential for studying aerosolradiationcloud interactions, air-quality forecasting, and climate-change assessment. Although data-driven methods have significantly advanced aerosol retrieval, the existing models often neglect the influence of aerosol type on retrieval accuracy. To address this gap, this study presents an improved data-driven aerosol retrieval framework that explicitly incorporates aerosol type information into model training. Aerosol classification is performed using the <inline-formula> <tex-math>$K$ </tex-math></inline-formula>-means unsupervised clustering algorithm to optimize training samples, thereby enhancing model adaptability and retrieval accuracy. The refined samples are then used to train an extremely randomized trees (ERTs) model, achieving an optimal balance between accuracy and computational efficiency. Validation results demonstrate strong performance, with a correlation coefficient of 0.93, a root mean square error (RMSE) of 0.072, and over 89% of results falling within the expected error range [(EE: ± (0.05+20% <inline-formula> <tex-math>$times $ </tex-math></inline-formula> in situ observations)], better than that of the traditional model. The findings demonstrate that integrating aerosol-type information into data-driven retrievals substantially improves accuracy and applicability for aerosol remote sensing. Future research should focus on refining aerosol classification techniques and integrating multisource remote sensing data to enhance model robustness and global applicability further.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data gaps exist in the measured spectral reflectance and atmospheric data from the radiometric calibration network (RadCalNet) due to instrument malfunctions or weather-related interferences, which severely impedes the application of the data. Therefore, developing a method to fill these missing RadCalNet data is a pressing issue. This study focuses on four RadCalNet sites with distinct surface types and proposes a high-precision bottom-of-atmosphere (BOA) spectral reflectance model. With on-site atmospheric data from RadCalNet, the predicted results achieve a root mean square error (RMSE) of no more than 1.26%. In scenarios where in situ atmospheric conditions are completely missing, the ERA5 dataset is used as a substitute and validated with Landsat 8 surface reflectance products; the absolute errors for all sites did not exceed 4.58%, validating the proposed method’s effectiveness. Additionally, the importance of input parameters and the impact of their uncertainties on prediction accuracy are discussed.
{"title":"A Method for Reconstructing Surface Spectral Reflectance With Missing RadCalNet Data","authors":"Shutian Zhu;Qiyue Liu;Chuanzhao Tian;Hanlie Xu;Jie Han;Wenhao Zhang;Na Xu","doi":"10.1109/LGRS.2025.3631876","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3631876","url":null,"abstract":"Data gaps exist in the measured spectral reflectance and atmospheric data from the radiometric calibration network (RadCalNet) due to instrument malfunctions or weather-related interferences, which severely impedes the application of the data. Therefore, developing a method to fill these missing RadCalNet data is a pressing issue. This study focuses on four RadCalNet sites with distinct surface types and proposes a high-precision bottom-of-atmosphere (BOA) spectral reflectance model. With on-site atmospheric data from RadCalNet, the predicted results achieve a root mean square error (RMSE) of no more than 1.26%. In scenarios where in situ atmospheric conditions are completely missing, the ERA5 dataset is used as a substitute and validated with Landsat 8 surface reflectance products; the absolute errors for all sites did not exceed 4.58%, validating the proposed method’s effectiveness. Additionally, the importance of input parameters and the impact of their uncertainties on prediction accuracy are discussed.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning has emerged as the predominant approach for ship detection in synthetic aperture radar (SAR) imagery. Nevertheless, persistent challenges such as densely clustered vessels, intricate background complexity, and multiscale target variations often lead to incomplete feature extraction, resulting in false alarms and missed detections. To address these limitations, this study presents LD-YOLO, an enhanced model based on YOLOv8n, which incorporates three critical innovations. Dynamic convolution layers are strategically embedded within key backbone stages to adaptively adjust kernel parameters, enhancing multiscale feature discriminability while maintaining computational efficiency. The proposed C2f-LSK module combines decomposed large-kernel convolution with attention mechanisms, enabling dynamic optimization of receptive field contributions across different detection stages and effective modeling of global contextual information. Considering the characteristics of small vessels in SAR imagery and the impact of downsampling rates on image quality, a dedicated $160times 160$ detection head is further integrated to preserve fine-grained details of small targets, complemented by bidirectional feature fusion to strengthen semantic context propagation. Extensive experiments validate the model’s superiority, achieving 98.2% of AP50 and 73.1% of AP50-95 on the SSDD benchmark, with consistent performance improvements demonstrated on HRSID (94.6% AP50) datasets. These advancements position LD-YOLO as a robust solution for maritime surveillance applications requiring high-precision SAR image analysis under complex operational conditions.
{"title":"LD-YOLO: A Lightweight Dynamic Convolution-Based YOLOv8n Framework for Robust Ship Detection in SAR Imagery","authors":"Jiqiang Niu;Mengyang Li;Hao Lin;Yichen Liu;Zijian Liu;Hongrui Li;Shaomian Niu","doi":"10.1109/LGRS.2025.3630098","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3630098","url":null,"abstract":"Deep learning has emerged as the predominant approach for ship detection in synthetic aperture radar (SAR) imagery. Nevertheless, persistent challenges such as densely clustered vessels, intricate background complexity, and multiscale target variations often lead to incomplete feature extraction, resulting in false alarms and missed detections. To address these limitations, this study presents LD-YOLO, an enhanced model based on YOLOv8n, which incorporates three critical innovations. Dynamic convolution layers are strategically embedded within key backbone stages to adaptively adjust kernel parameters, enhancing multiscale feature discriminability while maintaining computational efficiency. The proposed C2f-LSK module combines decomposed large-kernel convolution with attention mechanisms, enabling dynamic optimization of receptive field contributions across different detection stages and effective modeling of global contextual information. Considering the characteristics of small vessels in SAR imagery and the impact of downsampling rates on image quality, a dedicated <inline-formula> <tex-math>$160times 160$ </tex-math></inline-formula> detection head is further integrated to preserve fine-grained details of small targets, complemented by bidirectional feature fusion to strengthen semantic context propagation. Extensive experiments validate the model’s superiority, achieving 98.2% of AP50 and 73.1% of AP50-95 on the SSDD benchmark, with consistent performance improvements demonstrated on HRSID (94.6% AP50) datasets. These advancements position LD-YOLO as a robust solution for maritime surveillance applications requiring high-precision SAR image analysis under complex operational conditions.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coherent S-band radar has recently emerged as a promising technique for ocean surface wave and current detection. It can measure ocean surface current by estimating Doppler frequency shifts from sea surface signals. However, the conventional time averaging (TA) method neglects spatial dimension information and is unavailable under low wind speed conditions. Two algorithms for ocean current inversion are proposed in this letter: the spatial–temporal averaging (STA) method and the wavenumber--frequency (WF) method. In the STA method, the TA method is extended to the spatial–temporal domain. This approach fully exploits the spatial continuity of radar signals. In the WF method, a 2-D Fast Fourier Transform (2-D FFT) is applied to transform the spatial–temporal radial velocities into the WF domain. After employing dual filtering to eliminate nonlinear components, the radial current velocity is estimated through a modified dispersion relation fitting. The two methods are based on different physical mechanisms: the STA method measurements include wind drift components, while the WF method remains unaffected by wind drift. Therefore, wind drift can be effectively estimated by calculating the difference between the two methods’ measurements. Validation using observational data collected at Beishuang Island during Typhoon Catfish shows that the estimated wind drifts achieve a correlation coefficient (COR) of 0.90 with the “empirical model predictions.” This confirms the effectiveness of the proposed algorithms.
{"title":"Spatial–Temporal and Wavenumber--Frequency Inversion Algorithms for Ocean Surface Current Using Coherent S-Band Radar","authors":"Xinyu Fu;Chen Zhao;Zezong Chen;Sitao Wu;Fan Ding;Rui Liu;Guoxing Zheng","doi":"10.1109/LGRS.2025.3629684","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3629684","url":null,"abstract":"Coherent S-band radar has recently emerged as a promising technique for ocean surface wave and current detection. It can measure ocean surface current by estimating Doppler frequency shifts from sea surface signals. However, the conventional time averaging (TA) method neglects spatial dimension information and is unavailable under low wind speed conditions. Two algorithms for ocean current inversion are proposed in this letter: the spatial–temporal averaging (STA) method and the wavenumber--frequency (WF) method. In the STA method, the TA method is extended to the spatial–temporal domain. This approach fully exploits the spatial continuity of radar signals. In the WF method, a 2-D Fast Fourier Transform (2-D FFT) is applied to transform the spatial–temporal radial velocities into the WF domain. After employing dual filtering to eliminate nonlinear components, the radial current velocity is estimated through a modified dispersion relation fitting. The two methods are based on different physical mechanisms: the STA method measurements include wind drift components, while the WF method remains unaffected by wind drift. Therefore, wind drift can be effectively estimated by calculating the difference between the two methods’ measurements. Validation using observational data collected at Beishuang Island during Typhoon Catfish shows that the estimated wind drifts achieve a correlation coefficient (COR) of 0.90 with the “empirical model predictions.” This confirms the effectiveness of the proposed algorithms.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ship detection in remote sensing images plays an important role in various maritime activities. However, the existing deep learning methods face challenges, such as changes in ship target size, complex backgrounds, and noise interference in remote sensing images, which can lead to low detection accuracy and incomplete target detection. To address these issues, we proposed a synthetic aperture radar (SAR) image target detection framework called SDWPNet, aimed at improving target detection performance in complex scenes. First, we proposed SDWavetpool (SDW), which optimizes feature downsampling through multiscale wavelet features, effectively reducing the dimensionality of the feature map while preserving the detailed information of small targets. It can more accurately identify medium and large targets in complex backgrounds, fully utilizing multilevel features. Then, the network structure was optimized using a feature extraction module that combines the PPA mechanism, making it more focused on the details of small targets. In addition, we further improved the detection accuracy by improving the loss function (ICMPIoU). The experiments on the SAR ship detection dataset (SSDD) and high-resolution SAR image dataset (HRSID) show that this framework performs well in both accuracy and response speed of target detection, achieving 74.5% and 67.6% in $mathbf {mAP_{.50:.95}}$ , using only parameter 2.97 M.
{"title":"SDWPNet: A Downsampling-Driven Network for SAR Ship Detection With Refined Features and Optimized Loss","authors":"Xingyu Hu;Hongyu Chen;Yugang Chang;Xue Yang;Weiming Zeng","doi":"10.1109/LGRS.2025.3629377","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3629377","url":null,"abstract":"Ship detection in remote sensing images plays an important role in various maritime activities. However, the existing deep learning methods face challenges, such as changes in ship target size, complex backgrounds, and noise interference in remote sensing images, which can lead to low detection accuracy and incomplete target detection. To address these issues, we proposed a synthetic aperture radar (SAR) image target detection framework called SDWPNet, aimed at improving target detection performance in complex scenes. First, we proposed SDWavetpool (SDW), which optimizes feature downsampling through multiscale wavelet features, effectively reducing the dimensionality of the feature map while preserving the detailed information of small targets. It can more accurately identify medium and large targets in complex backgrounds, fully utilizing multilevel features. Then, the network structure was optimized using a feature extraction module that combines the PPA mechanism, making it more focused on the details of small targets. In addition, we further improved the detection accuracy by improving the loss function (ICMPIoU). The experiments on the SAR ship detection dataset (SSDD) and high-resolution SAR image dataset (HRSID) show that this framework performs well in both accuracy and response speed of target detection, achieving 74.5% and 67.6% in <inline-formula> <tex-math>$mathbf {mAP_{.50:.95}}$ </tex-math></inline-formula>, using only parameter 2.97 M.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-05DOI: 10.1109/LGRS.2025.3629303
Elman Ghazaei;Erchan Aptoula
ConvNets and Vision Transformers (ViTs) have been widely used for change detection (CD), though they exhibit limitations: long-range dependencies are not effectively captured by the former, while the latter are associated with high computational demands. Vision Mamba, based on State Space Models, has been proposed as an alternative, yet has been primarily utilized as a feature extraction backbone. In this work, the change state space model (CSSM) is introduced as a task-specific approach for CD, designed to focus exclusively on relevant changes between bitemporal images while filtering out irrelevant information. Through this design, the number of parameters is reduced, computational efficiency is improved, and robustness is enhanced. CSSM is evaluated on three benchmark datasets, where superior performance is achieved compared to ConvNets, ViTs, and Mamba-based models, at a significantly lower computational cost. The code will be made publicly available at https://github.com/Elman295/CSSM upon acceptance
{"title":"Efficient Remote Sensing Change Detection With Change State Space Models","authors":"Elman Ghazaei;Erchan Aptoula","doi":"10.1109/LGRS.2025.3629303","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3629303","url":null,"abstract":"ConvNets and Vision Transformers (ViTs) have been widely used for change detection (CD), though they exhibit limitations: long-range dependencies are not effectively captured by the former, while the latter are associated with high computational demands. Vision Mamba, based on State Space Models, has been proposed as an alternative, yet has been primarily utilized as a feature extraction backbone. In this work, the change state space model (CSSM) is introduced as a task-specific approach for CD, designed to focus exclusively on relevant changes between bitemporal images while filtering out irrelevant information. Through this design, the number of parameters is reduced, computational efficiency is improved, and robustness is enhanced. CSSM is evaluated on three benchmark datasets, where superior performance is achieved compared to ConvNets, ViTs, and Mamba-based models, at a significantly lower computational cost. The code will be made publicly available at <uri>https://github.com/Elman295/CSSM</uri> upon acceptance","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1109/LGRS.2025.3626855
Dat Minh-Tien Nguyen;Thien Huynh-The
Remote sensing object detection faces challenges such as small object sizes, complex backgrounds, and computational constraints. To overcome these challenges, we propose XSNet, an efficient deep learning (DL) model proficiently designed to enhance feature representation and multiscale detection. Concretely, XSNet introduces three key innovations: swin-involution transformer (SIner) to improve local self-attention and spatial adaptability, positional weight bi-level routing attention (PosWeightRA) to refine spatial awareness and preserve positional encoding, and an X-shaped multiscale feature fusion strategy to optimize feature aggregation while reducing computational cost. These components collectively improve detection accuracy, particularly for small and overlapping objects. Through extensive experiments, XSNet achieves impressive mAP0.5 and mAP0.95 scores of 47.1% and 28.2% on VisDrone2019, and 92.9% and 66.0% on RSOD. It outperforms state-of-the-art models while maintaining a compact size of 7.11 million parameters and fast inference time of 35.5 ms, making it well-suited for real-time remote sensing in resource-constrained environments.
{"title":"XSNet: Lightweight Object Detection Model Using X-Shaped Architecture in Remote Sensing Images","authors":"Dat Minh-Tien Nguyen;Thien Huynh-The","doi":"10.1109/LGRS.2025.3626855","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3626855","url":null,"abstract":"Remote sensing object detection faces challenges such as small object sizes, complex backgrounds, and computational constraints. To overcome these challenges, we propose XSNet, an efficient deep learning (DL) model proficiently designed to enhance feature representation and multiscale detection. Concretely, XSNet introduces three key innovations: swin-involution transformer (SIner) to improve local self-attention and spatial adaptability, positional weight bi-level routing attention (PosWeightRA) to refine spatial awareness and preserve positional encoding, and an X-shaped multiscale feature fusion strategy to optimize feature aggregation while reducing computational cost. These components collectively improve detection accuracy, particularly for small and overlapping objects. Through extensive experiments, XSNet achieves impressive mAP0.5 and mAP0.95 scores of 47.1% and 28.2% on VisDrone2019, and 92.9% and 66.0% on RSOD. It outperforms state-of-the-art models while maintaining a compact size of 7.11 million parameters and fast inference time of 35.5 ms, making it well-suited for real-time remote sensing in resource-constrained environments.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}