遥感自监督学习模式集成与增强视觉转换器

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-02-13 DOI:10.1109/TGRS.2025.3541390
Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao
{"title":"遥感自监督学习模式集成与增强视觉转换器","authors":"Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao","doi":"10.1109/TGRS.2025.3541390","DOIUrl":null,"url":null,"abstract":"Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-13"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing\",\"authors\":\"Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao\",\"doi\":\"10.1109/TGRS.2025.3541390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":\"63 \",\"pages\":\"1-13\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10884596/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10884596/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

最近的自监督学习(SSL)方法在从未标记的遥感(RS)图像中学习视觉表示方面显示了令人印象深刻的结果。然而,大多数RS图像主要由包含多个地面物体的场景组成,没有明确的前景目标,这限制了现有的SSL方法对前景目标的关注。这就提出了一个问题:是否有一种方法可以自动聚合场景RS图像中的相似对象,从而使模型能够区分嵌入在各种地理空间模式中的知识,以改进特征表示?在这项工作中,我们提出了模式集成和增强视觉转换器(PIEViT),这是一种专门为RS图像设计的新型SSL框架。PIEViT利用师生架构来处理图像级和补丁级任务。该算法采用地理空间模式聚类(GPC)模块来探索斑块的自然聚类,增强个体特征的差异性。采用特征集成投影(FIP)模块进一步细化基于地理空间聚类补丁的掩码重建。我们在多个下游任务中验证了PIEViT,包括对象检测、语义分割和变更检测。实验表明,PIEViT增强了内部斑块特征的表示,比现有的自监督基线有了显著的改进。该方法在目标检测、土地覆盖分类、变化检测等方面均取得了优异的成绩,突出了其在RS图像解译任务中的鲁棒性、泛化性和可移植性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
期刊最新文献
Fine-Scale Structure Reconstruction of Weather Radar Echoes via Blind Super-Resolution Generalized Iterative Sparse Maximum Likelihood Algorithm for the Detection of Buried Targets Unsupervised Snowy-Weather Point Cloud Denoising via Two-Stage Filter-Network Collaboration Numerical Study on Anisotropic Permeability Inversion from Dipole Seismoelectric Logging in Fluid-saturated Porous Formations Hybrid F-K Filtering and Deep Learning for P/S Separation in DAS VSP Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1