Visual-Semantic Cooperative Learning for Few-Shot SAR Target Classification

IF 5.3 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2025-01-16 DOI:10.1109/JSTARS.2025.3530442

Siyuan Wang;Yinghua Wang;Xiaoting Zhang;Chen Zhang;Hongwei Liu

{"title":"Visual-Semantic Cooperative Learning for Few-Shot SAR Target Classification","authors":"Siyuan Wang;Yinghua Wang;Xiaoting Zhang;Chen Zhang;Hongwei Liu","doi":"10.1109/JSTARS.2025.3530442","DOIUrl":null,"url":null,"abstract":"Nowadays, meta-learning is the mainstream method for solving few-shot synthetic aperture radar (SAR) target classification, devoted to learning a lot of empirical knowledge from the source domain to quickly recognize the novel classes after seeing only a few samples. However, obtaining the source domain with sufficiently labeled SAR images is difficult, leading to limited transferable empirical knowledge from the source to the target domain. Moreover, most existing methods only rely on visual images to learn the targets' feature representations, resulting in poor feature discriminability in few-shot situations. To tackle the above problems, we propose a novel visual-semantic cooperative network (VSC-Net) that involves visual and semantic dual classification to compensate for the inaccuracy of visual classification through semantic classification. First, we design textual semantic descriptions of SAR targets to exploit rich semantic information. Then, the designed textual semantic descriptions are encoded by the text encoder of the pretrained large vision language model to obtain class semantic embeddings of targets. In the visual classification stage, we develop the semantic-based visual prototype calibration module to project the class semantic embeddings to the visual space to calibrate the visual prototypes, improving the reliability of the prototypes computed from a few support samples. Besides, semantic consistency loss is proposed to constrain the accuracy of the class semantic embeddings projected to the visual space. During the semantic classification stage, the visual features of query samples are mapped into the semantic space, and their classes are predicted via searching for the nearest class semantic embeddings. Furthermore, we introduce a visual indication loss to modify the semantic classification using the calibrated visual prototypes. Ultimately, query samples' classes are decided by merging the visual and semantic classification results. We conduct adequate experiments on the SAR target dataset, which validate VSC-Net's few-shot classification efficacy.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"6532-6550"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10843851","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10843851/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, meta-learning is the mainstream method for solving few-shot synthetic aperture radar (SAR) target classification, devoted to learning a lot of empirical knowledge from the source domain to quickly recognize the novel classes after seeing only a few samples. However, obtaining the source domain with sufficiently labeled SAR images is difficult, leading to limited transferable empirical knowledge from the source to the target domain. Moreover, most existing methods only rely on visual images to learn the targets' feature representations, resulting in poor feature discriminability in few-shot situations. To tackle the above problems, we propose a novel visual-semantic cooperative network (VSC-Net) that involves visual and semantic dual classification to compensate for the inaccuracy of visual classification through semantic classification. First, we design textual semantic descriptions of SAR targets to exploit rich semantic information. Then, the designed textual semantic descriptions are encoded by the text encoder of the pretrained large vision language model to obtain class semantic embeddings of targets. In the visual classification stage, we develop the semantic-based visual prototype calibration module to project the class semantic embeddings to the visual space to calibrate the visual prototypes, improving the reliability of the prototypes computed from a few support samples. Besides, semantic consistency loss is proposed to constrain the accuracy of the class semantic embeddings projected to the visual space. During the semantic classification stage, the visual features of query samples are mapped into the semantic space, and their classes are predicted via searching for the nearest class semantic embeddings. Furthermore, we introduce a visual indication loss to modify the semantic classification using the calibrated visual prototypes. Ultimately, query samples' classes are decided by merging the visual and semantic classification results. We conduct adequate experiments on the SAR target dataset, which validate VSC-Net's few-shot classification efficacy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视觉-语义协同学习的SAR目标分类

目前，元学习是解决小次合成孔径雷达（SAR）目标分类问题的主流方法，它致力于从源域学习大量的经验知识，以便在只看到少量样本的情况下快速识别新的类别。然而，获得具有充分标记的SAR图像的源域是困难的，导致从源到目标域的经验知识的可转移性有限。此外，现有的方法大多只依赖于视觉图像来学习目标的特征表示，在少拍情况下，特征识别率较差。为了解决上述问题，我们提出了一种新的视觉语义合作网络（VSC-Net），该网络采用视觉和语义双重分类，通过语义分类来弥补视觉分类的不准确性。首先，我们设计了SAR目标的文本语义描述，以挖掘丰富的语义信息。然后，利用预训练好的大视觉语言模型的文本编码器对设计好的文本语义描述进行编码，得到目标的类语义嵌入。在视觉分类阶段，我们开发了基于语义的视觉原型校准模块，将类语义嵌入投影到视觉空间来校准视觉原型，提高了基于少量支持样本计算的原型的可靠性。此外，还提出了语义一致性损失来约束类语义嵌入投影到视觉空间的准确性。在语义分类阶段，将查询样本的视觉特征映射到语义空间中，并通过搜索最近的类语义嵌入来预测其类别。在此基础上，引入视觉指示损失，利用标定后的视觉原型对语义分类进行修正。最后，通过合并视觉和语义分类结果来确定查询样本的类。我们在SAR目标数据集上进行了充分的实验，验证了VSC-Net的少弹分类效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.