遥感自监督学习模式集成与增强视觉转换器

IF 8.6 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-02-13 DOI:10.1109/TGRS.2025.3541390

Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao

{"title":"遥感自监督学习模式集成与增强视觉转换器","authors":"Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao","doi":"10.1109/TGRS.2025.3541390","DOIUrl":null,"url":null,"abstract":"Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-13"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing\",\"authors\":\"Kaixuan Lu;Ruiqian Zhang;Xiao Huang;Yuxing Xie;Xiaogang Ning;Hanchao Zhang;Mengke Yuan;Pan Zhang;Tao Wang;Tongkui Liao\",\"doi\":\"10.1109/TGRS.2025.3541390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":\"63 \",\"pages\":\"1-13\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10884596/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10884596/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

最近的自监督学习（SSL）方法在从未标记的遥感（RS）图像中学习视觉表示方面显示了令人印象深刻的结果。然而，大多数RS图像主要由包含多个地面物体的场景组成，没有明确的前景目标，这限制了现有的SSL方法对前景目标的关注。这就提出了一个问题：是否有一种方法可以自动聚合场景RS图像中的相似对象，从而使模型能够区分嵌入在各种地理空间模式中的知识，以改进特征表示？在这项工作中，我们提出了模式集成和增强视觉转换器（PIEViT），这是一种专门为RS图像设计的新型SSL框架。PIEViT利用师生架构来处理图像级和补丁级任务。该算法采用地理空间模式聚类（GPC）模块来探索斑块的自然聚类，增强个体特征的差异性。采用特征集成投影（FIP）模块进一步细化基于地理空间聚类补丁的掩码重建。我们在多个下游任务中验证了PIEViT，包括对象检测、语义分割和变更检测。实验表明，PIEViT增强了内部斑块特征的表示，比现有的自监督基线有了显著的改进。该方法在目标检测、土地覆盖分类、变化检测等方面均取得了优异的成绩，突出了其在RS图像解译任务中的鲁棒性、泛化性和可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing (RS) images. However, most RS images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic RS images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the pattern integration and enhancement vision transformer (PIEViT), a novel SSL framework designed specifically for RS imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs a proposed, geospatial pattern cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. A feature integration projection (FIP) module is employed to further refine masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for RS image interpretation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.