首页 > 最新文献

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society最新文献

英文 中文
SDWPNet: A Downsampling-Driven Network for SAR Ship Detection With Refined Features and Optimized Loss SDWPNet:一种特征细化、损失优化的下采样驱动SAR舰船检测网络
Xingyu Hu;Hongyu Chen;Yugang Chang;Xue Yang;Weiming Zeng
Ship detection in remote sensing images plays an important role in various maritime activities. However, the existing deep learning methods face challenges, such as changes in ship target size, complex backgrounds, and noise interference in remote sensing images, which can lead to low detection accuracy and incomplete target detection. To address these issues, we proposed a synthetic aperture radar (SAR) image target detection framework called SDWPNet, aimed at improving target detection performance in complex scenes. First, we proposed SDWavetpool (SDW), which optimizes feature downsampling through multiscale wavelet features, effectively reducing the dimensionality of the feature map while preserving the detailed information of small targets. It can more accurately identify medium and large targets in complex backgrounds, fully utilizing multilevel features. Then, the network structure was optimized using a feature extraction module that combines the PPA mechanism, making it more focused on the details of small targets. In addition, we further improved the detection accuracy by improving the loss function (ICMPIoU). The experiments on the SAR ship detection dataset (SSDD) and high-resolution SAR image dataset (HRSID) show that this framework performs well in both accuracy and response speed of target detection, achieving 74.5% and 67.6% in $mathbf {mAP_{.50:.95}}$ , using only parameter 2.97 M.
遥感图像中的船舶检测在各种海事活动中发挥着重要作用。然而,现有的深度学习方法面临着船舶目标尺寸变化、背景复杂、遥感图像中存在噪声干扰等问题,导致检测精度低、目标检测不完整。为了解决这些问题,我们提出了一种合成孔径雷达(SAR)图像目标检测框架SDWPNet,旨在提高复杂场景下的目标检测性能。首先,我们提出SDWavetpool (SDW),通过多尺度小波特征优化特征降采样,在保留小目标详细信息的同时有效降低特征映射的维数;它可以更准确地识别复杂背景下的大中型目标,充分利用多层次特征。然后,利用结合PPA机制的特征提取模块对网络结构进行优化,使其更加关注小目标的细节。此外,我们通过改进损失函数(ICMPIoU)进一步提高了检测精度。在SAR船舶检测数据集(SSDD)和高分辨率SAR图像数据集(HRSID)上的实验表明,该框架在目标检测精度和响应速度上都有良好的表现,在$mathbf {mAP_{.50:上分别达到了74.5%和67.6%。$,只使用参数2.97 M。
{"title":"SDWPNet: A Downsampling-Driven Network for SAR Ship Detection With Refined Features and Optimized Loss","authors":"Xingyu Hu;Hongyu Chen;Yugang Chang;Xue Yang;Weiming Zeng","doi":"10.1109/LGRS.2025.3629377","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3629377","url":null,"abstract":"Ship detection in remote sensing images plays an important role in various maritime activities. However, the existing deep learning methods face challenges, such as changes in ship target size, complex backgrounds, and noise interference in remote sensing images, which can lead to low detection accuracy and incomplete target detection. To address these issues, we proposed a synthetic aperture radar (SAR) image target detection framework called SDWPNet, aimed at improving target detection performance in complex scenes. First, we proposed SDWavetpool (SDW), which optimizes feature downsampling through multiscale wavelet features, effectively reducing the dimensionality of the feature map while preserving the detailed information of small targets. It can more accurately identify medium and large targets in complex backgrounds, fully utilizing multilevel features. Then, the network structure was optimized using a feature extraction module that combines the PPA mechanism, making it more focused on the details of small targets. In addition, we further improved the detection accuracy by improving the loss function (ICMPIoU). The experiments on the SAR ship detection dataset (SSDD) and high-resolution SAR image dataset (HRSID) show that this framework performs well in both accuracy and response speed of target detection, achieving 74.5% and 67.6% in <inline-formula> <tex-math>$mathbf {mAP_{.50:.95}}$ </tex-math></inline-formula>, using only parameter 2.97 M.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Remote Sensing Change Detection With Change State Space Models 基于变化状态空间模型的高效遥感变化检测
Elman Ghazaei;Erchan Aptoula
ConvNets and Vision Transformers (ViTs) have been widely used for change detection (CD), though they exhibit limitations: long-range dependencies are not effectively captured by the former, while the latter are associated with high computational demands. Vision Mamba, based on State Space Models, has been proposed as an alternative, yet has been primarily utilized as a feature extraction backbone. In this work, the change state space model (CSSM) is introduced as a task-specific approach for CD, designed to focus exclusively on relevant changes between bitemporal images while filtering out irrelevant information. Through this design, the number of parameters is reduced, computational efficiency is improved, and robustness is enhanced. CSSM is evaluated on three benchmark datasets, where superior performance is achieved compared to ConvNets, ViTs, and Mamba-based models, at a significantly lower computational cost. The code will be made publicly available at https://github.com/Elman295/CSSM upon acceptance
卷积神经网络和视觉变压器(ViTs)已经广泛用于变化检测(CD),尽管它们显示出局限性:前者不能有效地捕获远程依赖关系,而后者与高计算需求相关。基于状态空间模型的视觉曼巴(Vision Mamba)已被提出作为一种替代方法,但目前主要用作特征提取主干。在这项工作中,变化状态空间模型(CSSM)作为一种特定于CD的任务方法被引入,旨在专门关注双时间图像之间的相关变化,同时过滤掉无关信息。通过这种设计,减少了参数的数量,提高了计算效率,增强了鲁棒性。CSSM在三个基准数据集上进行了评估,与基于ConvNets、ViTs和mamba的模型相比,CSSM在计算成本显著降低的情况下取得了卓越的性能。一经接受,代码将在https://github.com/Elman295/CSSM上公开发布
{"title":"Efficient Remote Sensing Change Detection With Change State Space Models","authors":"Elman Ghazaei;Erchan Aptoula","doi":"10.1109/LGRS.2025.3629303","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3629303","url":null,"abstract":"ConvNets and Vision Transformers (ViTs) have been widely used for change detection (CD), though they exhibit limitations: long-range dependencies are not effectively captured by the former, while the latter are associated with high computational demands. Vision Mamba, based on State Space Models, has been proposed as an alternative, yet has been primarily utilized as a feature extraction backbone. In this work, the change state space model (CSSM) is introduced as a task-specific approach for CD, designed to focus exclusively on relevant changes between bitemporal images while filtering out irrelevant information. Through this design, the number of parameters is reduced, computational efficiency is improved, and robustness is enhanced. CSSM is evaluated on three benchmark datasets, where superior performance is achieved compared to ConvNets, ViTs, and Mamba-based models, at a significantly lower computational cost. The code will be made publicly available at <uri>https://github.com/Elman295/CSSM</uri> upon acceptance","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XSNet: Lightweight Object Detection Model Using X-Shaped Architecture in Remote Sensing Images XSNet:基于x形结构的遥感图像轻量化目标检测模型
Dat Minh-Tien Nguyen;Thien Huynh-The
Remote sensing object detection faces challenges such as small object sizes, complex backgrounds, and computational constraints. To overcome these challenges, we propose XSNet, an efficient deep learning (DL) model proficiently designed to enhance feature representation and multiscale detection. Concretely, XSNet introduces three key innovations: swin-involution transformer (SIner) to improve local self-attention and spatial adaptability, positional weight bi-level routing attention (PosWeightRA) to refine spatial awareness and preserve positional encoding, and an X-shaped multiscale feature fusion strategy to optimize feature aggregation while reducing computational cost. These components collectively improve detection accuracy, particularly for small and overlapping objects. Through extensive experiments, XSNet achieves impressive mAP0.5 and mAP0.95 scores of 47.1% and 28.2% on VisDrone2019, and 92.9% and 66.0% on RSOD. It outperforms state-of-the-art models while maintaining a compact size of 7.11 million parameters and fast inference time of 35.5 ms, making it well-suited for real-time remote sensing in resource-constrained environments.
遥感目标检测面临着小目标尺寸、复杂背景和计算限制等挑战。为了克服这些挑战,我们提出了XSNet,一种高效的深度学习(DL)模型,旨在增强特征表示和多尺度检测。具体而言,XSNet引入了三个关键创新:旋转-对合变压器(SIner)提高局部自关注和空间适应性,位置权重双级路由关注(PosWeightRA)改进空间感知和保留位置编码,x形多尺度特征融合策略优化特征聚合,同时降低计算成本。这些组件共同提高了检测精度,特别是对于小的和重叠的物体。通过广泛的实验,XSNet在VisDrone2019上取得了令人印象深刻的mAP0.5和mAP0.95分数,分别为47.1%和28.2%,在RSOD上取得了92.9%和66.0%的分数。它优于最先进的模型,同时保持711万个参数的紧凑尺寸和35.5 ms的快速推断时间,使其非常适合资源受限环境下的实时遥感。
{"title":"XSNet: Lightweight Object Detection Model Using X-Shaped Architecture in Remote Sensing Images","authors":"Dat Minh-Tien Nguyen;Thien Huynh-The","doi":"10.1109/LGRS.2025.3626855","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3626855","url":null,"abstract":"Remote sensing object detection faces challenges such as small object sizes, complex backgrounds, and computational constraints. To overcome these challenges, we propose XSNet, an efficient deep learning (DL) model proficiently designed to enhance feature representation and multiscale detection. Concretely, XSNet introduces three key innovations: swin-involution transformer (SIner) to improve local self-attention and spatial adaptability, positional weight bi-level routing attention (PosWeightRA) to refine spatial awareness and preserve positional encoding, and an X-shaped multiscale feature fusion strategy to optimize feature aggregation while reducing computational cost. These components collectively improve detection accuracy, particularly for small and overlapping objects. Through extensive experiments, XSNet achieves impressive mAP0.5 and mAP0.95 scores of 47.1% and 28.2% on VisDrone2019, and 92.9% and 66.0% on RSOD. It outperforms state-of-the-art models while maintaining a compact size of 7.11 million parameters and fast inference time of 35.5 ms, making it well-suited for real-time remote sensing in resource-constrained environments.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TopoSegNet: Enhancing Geometric Fidelity of Coastline Extraction via a Joint Segmentation and Topological Reasoning Framework TopoSegNet:通过联合分割和拓扑推理框架提高海岸线提取的几何保真度
Binge Cui;Shengyun Liu;Jing Zhang;Yan Lu
Coastline extraction from remote sensing imagery is persistently challenged by intra-class heterogeneity (e.g., diverse coastline types) and boundary ambiguity. Existing methods often exhibit suboptimal performance in complex scenes mixing artificial and natural landforms, as they tend to ignore coastline morphological priors and struggle to recover details in low-contrast regions. To address these issues, this letter introduces TopoSegNet, a novel collaborative framework centered on a dual-decoder architecture. A segmentation decoder utilizes a morphology-aware attention (MAA) module to adaptively decouple and model diverse coastline morphologies and a structure-detail synergistic enhancement (SDSE) module to reconstruct weak boundaries with high fidelity. Meanwhile, a learnable topology decoder frames topology construction as a graph reasoning task, which ensures the geometric and topological integrity of the final vector output. TopoSegNet was evaluated on the public Landsat-8 and a custom Lianyungang Gaofen-1 (GF-1) dataset. The experimental results show that the proposed method reached 98.64%, 66.80%, and 0.795% on the mIoU, BIoU, and average path length similarity (APLS) metrics, respectively, verifying its validity and superiority. Compared to the state-of-the-art methods, the TopoSegNet model demonstrates significantly higher accuracy and topological fidelity.
从遥感影像中提取海岸线一直受到类内异质性(如海岸线类型的多样性)和边界模糊性的挑战。现有的方法往往在人工和自然地形混合的复杂场景中表现不佳,因为它们往往忽略海岸线形态先验,并且难以在低对比度区域恢复细节。为了解决这些问题,本文介绍了TopoSegNet,这是一种以双解码器架构为中心的新型协作框架。分割解码器利用形态感知注意(MAA)模块自适应解耦和建模不同的海岸线形态,利用结构-细节协同增强(SDSE)模块重建高保真的弱边界。同时,一个可学习的拓扑解码器将拓扑构造作为一个图推理任务,保证了最终矢量输出的几何和拓扑完整性。TopoSegNet在公共Landsat-8和自定义连云港高分一号(GF-1)数据集上进行了评估。实验结果表明,该方法在mIoU、BIoU和平均路径长度相似度(apl)指标上分别达到98.64%、66.80%和0.795%,验证了该方法的有效性和优越性。与最先进的方法相比,TopoSegNet模型显示出更高的精度和拓扑保真度。
{"title":"TopoSegNet: Enhancing Geometric Fidelity of Coastline Extraction via a Joint Segmentation and Topological Reasoning Framework","authors":"Binge Cui;Shengyun Liu;Jing Zhang;Yan Lu","doi":"10.1109/LGRS.2025.3626786","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3626786","url":null,"abstract":"Coastline extraction from remote sensing imagery is persistently challenged by intra-class heterogeneity (e.g., diverse coastline types) and boundary ambiguity. Existing methods often exhibit suboptimal performance in complex scenes mixing artificial and natural landforms, as they tend to ignore coastline morphological priors and struggle to recover details in low-contrast regions. To address these issues, this letter introduces TopoSegNet, a novel collaborative framework centered on a dual-decoder architecture. A segmentation decoder utilizes a morphology-aware attention (MAA) module to adaptively decouple and model diverse coastline morphologies and a structure-detail synergistic enhancement (SDSE) module to reconstruct weak boundaries with high fidelity. Meanwhile, a learnable topology decoder frames topology construction as a graph reasoning task, which ensures the geometric and topological integrity of the final vector output. TopoSegNet was evaluated on the public Landsat-8 and a custom Lianyungang Gaofen-1 (GF-1) dataset. The experimental results show that the proposed method reached 98.64%, 66.80%, and 0.795% on the mIoU, BIoU, and average path length similarity (APLS) metrics, respectively, verifying its validity and superiority. Compared to the state-of-the-art methods, the TopoSegNet model demonstrates significantly higher accuracy and topological fidelity.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robust Joint Optimization Network for Feature Detection and Description in Optical and SAR Image Matching 光学与SAR图像匹配中特征检测与描述的鲁棒联合优化网络
Xinshan Zhang;Zhitao Fu;Menghua Li;Shaochen Zhang;Han Nie;Bo-Hui Tang
Deep learning approaches that jointly learn feature extraction have achieved remarkable progress in image matching. However, current methods often treat central and neighboring pixels uniformly and use static feature selection strategies that fail to account for environmental variations. This results in limited robustness of descriptors and keypoints, thereby affecting matching accuracy. To address these limitations, we propose a robust joint optimization network for feature detection and description in optical and SAR image matching. A center-weighted module (CWM) is designed to enhance local feature representation by emphasizing the hierarchical relationship between central and surrounding features. Furthermore, a multiscale gated aggregation (MSGA) module is introduced to suppress redundant responses and improve keypoint discriminability through a gating mechanism. To address the inconsistency of score maps across heterogeneous modalities, we design a position-constrained repeatability loss to guide the network in learning stable and consistent keypoint correspondences. Experimental results across various scenarios demonstrate that the proposed method outperforms state-of-the-art techniques in terms of both matching accuracy and the number of correct matches, highlighting its robustness and effectiveness.
联合学习特征提取的深度学习方法在图像匹配方面取得了显著进展。然而,目前的方法通常是均匀地处理中心和邻近像素,并使用静态特征选择策略,无法考虑环境变化。这导致描述符和关键点的鲁棒性有限,从而影响匹配精度。为了解决这些限制,我们提出了一个鲁棒的联合优化网络,用于光学和SAR图像匹配中的特征检测和描述。设计了中心加权模块(CWM),通过强调中心特征和周围特征之间的层次关系来增强局部特征的表示。在此基础上,引入了多尺度门控聚合(MSGA)模块,通过门控机制抑制冗余响应,提高关键点的可分辨性。为了解决跨异构模式的分数图不一致的问题,我们设计了一个位置约束的可重复性损失来指导网络学习稳定和一致的关键点对应。各种场景下的实验结果表明,该方法在匹配精度和正确匹配数量方面都优于目前最先进的技术,突出了其鲁棒性和有效性。
{"title":"A Robust Joint Optimization Network for Feature Detection and Description in Optical and SAR Image Matching","authors":"Xinshan Zhang;Zhitao Fu;Menghua Li;Shaochen Zhang;Han Nie;Bo-Hui Tang","doi":"10.1109/LGRS.2025.3626750","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3626750","url":null,"abstract":"Deep learning approaches that jointly learn feature extraction have achieved remarkable progress in image matching. However, current methods often treat central and neighboring pixels uniformly and use static feature selection strategies that fail to account for environmental variations. This results in limited robustness of descriptors and keypoints, thereby affecting matching accuracy. To address these limitations, we propose a robust joint optimization network for feature detection and description in optical and SAR image matching. A center-weighted module (CWM) is designed to enhance local feature representation by emphasizing the hierarchical relationship between central and surrounding features. Furthermore, a multiscale gated aggregation (MSGA) module is introduced to suppress redundant responses and improve keypoint discriminability through a gating mechanism. To address the inconsistency of score maps across heterogeneous modalities, we design a position-constrained repeatability loss to guide the network in learning stable and consistent keypoint correspondences. Experimental results across various scenarios demonstrate that the proposed method outperforms state-of-the-art techniques in terms of both matching accuracy and the number of correct matches, highlighting its robustness and effectiveness.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MambaCast: An Efficient Precipitation Nowcasting Model With Dual-Branch Mamba MambaCast:一种有效的双分支Mamba降水临近预报模式
Haowen Jin;Yuankang Ye;Chang Liu;Feng Gao
Precipitation nowcasting using radar echo data is critical for issuing timely extreme weather warnings, yet the existing models struggle to balance computational efficiency with prediction accuracy when modeling complex, nonlinear echo sequences. To address these challenges, we propose MambaCast, a novel dual-branch precipitation nowcasting model built upon the Mamba framework. Specifically, MambaCast incorporates three key components: a state-space model (SSM) branch, a convolutional neural network (CNN) branch and a CastFusion module. The SSM branch captures global low-frequency evolution features in the radar echo field through a selective scanning mechanism, while the CNN branch extracts local high-frequency transient features using gated spatiotemporal attention (gSTA). The CastFusion module dynamically integrates features across different frequency scales, enabling adaptive fusion of spatiotemporal distribution. Experiments on two public radar datasets show that MambaCast consistently outperforms baseline models.
利用雷达回波数据进行降水临近预报对于及时发布极端天气预警至关重要,然而现有模型在模拟复杂的非线性回波序列时难以平衡计算效率和预测精度。为了解决这些挑战,我们提出了MambaCast,一种基于Mamba框架的新型双分支降水临近预报模型。具体来说,MambaCast包含三个关键组件:一个状态空间模型(SSM)分支,一个卷积神经网络(CNN)分支和一个CastFusion模块。SSM分支通过选择性扫描机制捕获雷达回波场的全局低频演化特征,CNN分支利用门控时空注意(gSTA)提取局部高频瞬态特征。CastFusion模块动态集成不同频率尺度的特征,实现时空分布的自适应融合。在两个公共雷达数据集上的实验表明,MambaCast始终优于基线模型。
{"title":"MambaCast: An Efficient Precipitation Nowcasting Model With Dual-Branch Mamba","authors":"Haowen Jin;Yuankang Ye;Chang Liu;Feng Gao","doi":"10.1109/LGRS.2025.3626369","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3626369","url":null,"abstract":"Precipitation nowcasting using radar echo data is critical for issuing timely extreme weather warnings, yet the existing models struggle to balance computational efficiency with prediction accuracy when modeling complex, nonlinear echo sequences. To address these challenges, we propose MambaCast, a novel dual-branch precipitation nowcasting model built upon the Mamba framework. Specifically, MambaCast incorporates three key components: a state-space model (SSM) branch, a convolutional neural network (CNN) branch and a CastFusion module. The SSM branch captures global low-frequency evolution features in the radar echo field through a selective scanning mechanism, while the CNN branch extracts local high-frequency transient features using gated spatiotemporal attention (gSTA). The CastFusion module dynamically integrates features across different frequency scales, enabling adaptive fusion of spatiotemporal distribution. Experiments on two public radar datasets show that MambaCast consistently outperforms baseline models.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TinyRS-R1: Compact Vision Language Model for Remote Sensing TinyRS-R1:用于遥感的紧凑视觉语言模型
Aybora Köksal;A. Aydın Alatan
Remote sensing (RS) applications often rely on edge hardware that cannot host the models in the 7B parametric vision language of today. This letter presents TinyRS, the first 2B-parameter vision language models (VLMs) optimized for RS, and TinyRS-R1, its reasoning-augmented variant. Based on Qwen2-VL-2B, TinyRS is trained via a four-stage pipeline: pretraining on million-scale satellite images, instruction tuning, fine-tuning with chain-of-thought (CoT) annotations from a new reasoning dataset, and group relative policy optimization (GRPO)-based alignment. TinyRS-R1 matches or surpasses recent 7B RS models in classification, visual question answering (VQA), grounding, and open-ended QA—while using one third of the memory and latency. CoT reasoning improves grounding and scene understanding, while TinyRS excels at concise, low-latency VQA. TinyRS-R1 is the first domain-specialized small VLM with GRPO-aligned CoT reasoning for general-purpose RS. The code, models, and caption datasets are available at https://github.com/aybora/TinyRS
遥感(RS)应用通常依赖于边缘硬件,这些硬件无法在当今的7B参数化视觉语言中托管模型。这封信介绍了TinyRS,第一个针对RS优化的2b参数视觉语言模型(VLMs),以及TinyRS- r1,它的推理增强变体。基于Qwen2-VL-2B, TinyRS通过四个阶段的流水线进行训练:百万级卫星图像的预训练,指令调优,使用新的推理数据集的思维链(CoT)注释进行微调,以及基于组相对策略优化(GRPO)的对齐。TinyRS-R1在分类、视觉问答(VQA)、接地和开放式问答方面匹配或超过了最近的7B RS模型,同时使用了三分之一的内存和延迟。CoT推理提高了基础和场景理解,而TinyRS擅长于简洁、低延迟的VQA。TinyRS-R1是第一个领域专用的小型VLM,具有用于通用RS的GRPO-aligned CoT推理。代码,模型和标题数据集可在https://github.com/aybora/TinyRS获得
{"title":"TinyRS-R1: Compact Vision Language Model for Remote Sensing","authors":"Aybora Köksal;A. Aydın Alatan","doi":"10.1109/LGRS.2025.3623244","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3623244","url":null,"abstract":"Remote sensing (RS) applications often rely on edge hardware that cannot host the models in the 7B parametric vision language of today. This letter presents TinyRS, the first 2B-parameter vision language models (VLMs) optimized for RS, and TinyRS-R1, its reasoning-augmented variant. Based on Qwen2-VL-2B, TinyRS is trained via a four-stage pipeline: pretraining on million-scale satellite images, instruction tuning, fine-tuning with chain-of-thought (CoT) annotations from a new reasoning dataset, and group relative policy optimization (GRPO)-based alignment. TinyRS-R1 matches or surpasses recent 7B RS models in classification, visual question answering (VQA), grounding, and open-ended QA—while using one third of the memory and latency. CoT reasoning improves grounding and scene understanding, while TinyRS excels at concise, low-latency VQA. TinyRS-R1 is the first domain-specialized small VLM with GRPO-aligned CoT reasoning for general-purpose RS. The code, models, and caption datasets are available at <uri>https://github.com/aybora/TinyRS</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale Window Attention Channel Enhanced for Remote Sensing Image Super-Resolution 遥感图像超分辨率多尺度窗口注意通道增强
Jingfan Wang;Wen Lu;Zeming Zhang;Zhaoyang Wang;Zhe Li
Transformer-based methods for remote sensing image super-resolution (SR) face challenges in reconstructing high-frequency textures due to the interference from large flat regions, such as farmlands and water bodies. To address these limitations, we propose a channel-enhanced multiscale window attention mechanism, which is designed to minimize the impact of flat regions on high-frequency area reconstruction while effectively utilizing the intrinsic multiscale features of remote sensing images. To better capture the multiscale features of remote sensing images, we introduce a series of depthwise separable convolution kernels of varying sizes during the shallow feature extraction stage. Experimental results demonstrate that the proposed method achieves superior peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) scores across multiple remote sensing benchmark datasets and scaling factors, validating its effectiveness.
基于变压器的遥感图像超分辨率(SR)方法由于受到农田和水体等大面积平坦区域的干扰,在重建高频纹理方面面临挑战。为了解决这些限制,我们提出了一种通道增强的多尺度窗口注意机制,该机制旨在最大限度地减少平坦区域对高频区域重建的影响,同时有效地利用遥感图像固有的多尺度特征。为了更好地捕获遥感图像的多尺度特征,我们在浅层特征提取阶段引入了一系列不同大小的深度可分离卷积核。实验结果表明,该方法在多个遥感基准数据集和尺度因子上均能获得较好的峰值信噪比(PSNR)和结构相似度(SSIM)分数,验证了其有效性。
{"title":"Multiscale Window Attention Channel Enhanced for Remote Sensing Image Super-Resolution","authors":"Jingfan Wang;Wen Lu;Zeming Zhang;Zhaoyang Wang;Zhe Li","doi":"10.1109/LGRS.2025.3620872","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3620872","url":null,"abstract":"Transformer-based methods for remote sensing image super-resolution (SR) face challenges in reconstructing high-frequency textures due to the interference from large flat regions, such as farmlands and water bodies. To address these limitations, we propose a channel-enhanced multiscale window attention mechanism, which is designed to minimize the impact of flat regions on high-frequency area reconstruction while effectively utilizing the intrinsic multiscale features of remote sensing images. To better capture the multiscale features of remote sensing images, we introduce a series of depthwise separable convolution kernels of varying sizes during the shallow feature extraction stage. Experimental results demonstrate that the proposed method achieves superior peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) scores across multiple remote sensing benchmark datasets and scaling factors, validating its effectiveness.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"23 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Global and Local Information for Remote Sensing Image–Text Retrieval 集成全局和局部信息的遥感图像-文本检索
Ziyun Chen;Fan Liu;Zhangqingyun Guan;Qian Zhou;Xiaocong Zhou;Chuanyi Zhang
Pretrained vision–language models (VLMs) have demonstrated promising performance in remote sensing (RS) image–text retrieval tasks. However, the scarcity of high-quality image–text datasets remains a challenge in fine-tuning VLMs for RS. The captions in existing datasets tend to be uniform and lack details. To fully use rich detailed information from RS images, we propose a method to fine-tune VLMs. We first construct a new visual–language dataset that balances both global and local information for RS (GLRS) image–text retrieval. Specifically, a multimodal large language model (MLLM) is used to generate captions for local patches and global captions for the entire image. To effectively use local information, we propose a global and local image captioning method (GLCap). With a large language model (LLM), we further obtain higher quality captions by merging both global and local captions. Finally, we fine-tune the weights of RS-M-contrastive language image pretraining (CLIP) with a progressive global–local fine-tuning strategy on GLRS. Experimental results demonstrate that our method outperforms state-of-the-art (SoTA) approaches on two common RS image–text retrieval downstream tasks. Our code and dataset are available at https://github.com/hhu-czy/GLRS
预训练的视觉语言模型(VLMs)在遥感图像文本检索任务中表现出了良好的性能。然而,高质量的图像-文本数据集的缺乏仍然是对遥感vlm进行微调的一个挑战,现有数据集的标题往往是统一的,缺乏细节。为了充分利用RS图像中丰富的细节信息,我们提出了一种微调VLMs的方法。我们首先构建了一个新的视觉语言数据集,该数据集平衡了RS (GLRS)图像文本检索的全局和局部信息。具体而言,使用多模态大语言模型(multimodal large language model, MLLM)生成局部补丁的标题和整个图像的全局标题。为了有效地利用局部信息,我们提出了一种全局和局部图像字幕方法(GLCap)。使用大型语言模型(LLM),我们通过合并全局和局部字幕进一步获得更高质量的字幕。最后,我们在GLRS上采用渐进的全局-局部微调策略对rs - m对比语言图像预训练(CLIP)的权重进行微调。实验结果表明,我们的方法在两个常见的RS图像文本检索下游任务上优于最先进的(SoTA)方法。我们的代码和数据集可在https://github.com/hhu-czy/GLRS上获得
{"title":"Integrating Global and Local Information for Remote Sensing Image–Text Retrieval","authors":"Ziyun Chen;Fan Liu;Zhangqingyun Guan;Qian Zhou;Xiaocong Zhou;Chuanyi Zhang","doi":"10.1109/LGRS.2025.3616154","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3616154","url":null,"abstract":"Pretrained vision–language models (VLMs) have demonstrated promising performance in remote sensing (RS) image–text retrieval tasks. However, the scarcity of high-quality image–text datasets remains a challenge in fine-tuning VLMs for RS. The captions in existing datasets tend to be uniform and lack details. To fully use rich detailed information from RS images, we propose a method to fine-tune VLMs. We first construct a new visual–language dataset that balances both global and local information for RS (GLRS) image–text retrieval. Specifically, a multimodal large language model (MLLM) is used to generate captions for local patches and global captions for the entire image. To effectively use local information, we propose a global and local image captioning method (GLCap). With a large language model (LLM), we further obtain higher quality captions by merging both global and local captions. Finally, we fine-tune the weights of RS-M-contrastive language image pretraining (CLIP) with a progressive global–local fine-tuning strategy on GLRS. Experimental results demonstrate that our method outperforms state-of-the-art (SoTA) approaches on two common RS image–text retrieval downstream tasks. Our code and dataset are available at <uri>https://github.com/hhu-czy/GLRS</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Triple-GAN With Similarity Constraint for Automatic Underground Object Classification Using Ground Penetrating Radar Data 基于相似约束的半监督三重gan探地雷达地下目标自动分类
Li Liu;Yongcheng Zhou;Hang Xu;Jingxia Li;Jianguo Zhang;Lijun Zhou;Bingjie Wang
Automatic underground object classification based on deep learning (DL) has been widely used in ground penetrating radar (GPR) fields. However, its excellent performance heavily depends on sufficient labeled training data. In GPR fields, large amounts of labeled data are difficult to obtain due to time-consuming and experience-dependent manual annotation work. To address the issue of limited labeled data, we propose a novel semi-supervised learning (SSL) method for urban-road underground multiclass object classification. It fully utilizes abundant unlabeled data and limited labeled data to enhance classification performance. We applied a variant of the triple-GAN (TGAN) model and modified it by introducing a similarity constraint, which is associated with GPR data geometric features and can help to produce high-quality generated images. Experimental results of laboratory and field data show that it has higher accuracy than representative baseline methods under limited labeled data.
基于深度学习的地下目标自动分类技术在探地雷达领域得到了广泛的应用。然而,其优异的性能在很大程度上依赖于足够的标记训练数据。在探地雷达领域,由于人工标注耗时且依赖经验,难以获得大量标注数据。为了解决标记数据有限的问题,提出了一种新的半监督学习(SSL)方法用于城市道路地下多类目标分类。它充分利用了丰富的未标记数据和有限的标记数据来提高分类性能。我们应用了三重gan (TGAN)模型的一种变体,并通过引入与GPR数据几何特征相关的相似性约束对其进行了修改,从而有助于生成高质量的生成图像。实验室和现场数据的实验结果表明,在有限的标记数据下,该方法比代表性基线方法具有更高的精度。
{"title":"Semi-Supervised Triple-GAN With Similarity Constraint for Automatic Underground Object Classification Using Ground Penetrating Radar Data","authors":"Li Liu;Yongcheng Zhou;Hang Xu;Jingxia Li;Jianguo Zhang;Lijun Zhou;Bingjie Wang","doi":"10.1109/LGRS.2025.3609444","DOIUrl":"https://doi.org/10.1109/LGRS.2025.3609444","url":null,"abstract":"Automatic underground object classification based on deep learning (DL) has been widely used in ground penetrating radar (GPR) fields. However, its excellent performance heavily depends on sufficient labeled training data. In GPR fields, large amounts of labeled data are difficult to obtain due to time-consuming and experience-dependent manual annotation work. To address the issue of limited labeled data, we propose a novel semi-supervised learning (SSL) method for urban-road underground multiclass object classification. It fully utilizes abundant unlabeled data and limited labeled data to enhance classification performance. We applied a variant of the triple-GAN (TGAN) model and modified it by introducing a similarity constraint, which is associated with GPR data geometric features and can help to produce high-quality generated images. Experimental results of laboratory and field data show that it has higher accuracy than representative baseline methods under limited labeled data.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1