Dual adaptive local semantic alignment for few-shot fine-grained classification

The Visual Computer Pub Date : 2024-07-22 DOI:10.1007/s00371-024-03576-z

Wei Song, Kaili Yang

{"title":"Dual adaptive local semantic alignment for few-shot fine-grained classification","authors":"Wei Song, Kaili Yang","doi":"10.1007/s00371-024-03576-z","DOIUrl":null,"url":null,"abstract":"<p>Few-shot fine-grained classification (FS-FGC) aims to learn discriminative semantic details (e.g., beaks and wings) with few labeled samples to precisely recognize novel classes. However, existing feature alignment methods mainly use a support set to align the query sample, which may lead to incorrect alignment of local semantic due to interference from background and non-target objects. In addition, these methods do not take into account the discrepancy of semantic information among channels. To address the above issues, we propose an effective dual adaptive local semantic alignment approach, which is composed of the channel semantic alignment module (CSAM) and the spatial semantic alignment module (SSAM). Specifically, CSAM adaptively generates channel weights to highlight discriminative information based on two sub-modules, namely the class-aware attention module and the target-aware attention module. CAM emphasizes the discriminative semantic details of each category in the support set and TAM enhances the target object region of the query image. On the basis of this, SSAM promotes effective alignment of semantically relevant local regions through a spatial bidirectional alignment strategy. Combining two adaptive modules to better capture fine-grained semantic contextual information along two dimensions, channel and spatial improves the accuracy and robustness of FS-FGC. Experimental results on three widely used fine-grained classification datasets demonstrate excellent performance that has significant competitive advantages over current mainstream methods. Codes are available at: https://github.com/kellyagya/DALSA.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03576-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Few-shot fine-grained classification (FS-FGC) aims to learn discriminative semantic details (e.g., beaks and wings) with few labeled samples to precisely recognize novel classes. However, existing feature alignment methods mainly use a support set to align the query sample, which may lead to incorrect alignment of local semantic due to interference from background and non-target objects. In addition, these methods do not take into account the discrepancy of semantic information among channels. To address the above issues, we propose an effective dual adaptive local semantic alignment approach, which is composed of the channel semantic alignment module (CSAM) and the spatial semantic alignment module (SSAM). Specifically, CSAM adaptively generates channel weights to highlight discriminative information based on two sub-modules, namely the class-aware attention module and the target-aware attention module. CAM emphasizes the discriminative semantic details of each category in the support set and TAM enhances the target object region of the query image. On the basis of this, SSAM promotes effective alignment of semantically relevant local regions through a spatial bidirectional alignment strategy. Combining two adaptive modules to better capture fine-grained semantic contextual information along two dimensions, channel and spatial improves the accuracy and robustness of FS-FGC. Experimental results on three widely used fine-grained classification datasets demonstrate excellent performance that has significant competitive advantages over current mainstream methods. Codes are available at: https://github.com/kellyagya/DALSA.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于少镜头精细分类的双自适应局部语义配准

微粒度分类法（FS-FGC）旨在利用少量标注样本学习具有区分性的语义细节（如喙和翅膀），以精确识别新类别。然而，现有的特征对齐方法主要使用支持集来对齐查询样本，这可能会因背景和非目标对象的干扰而导致局部语义的不正确对齐。此外，这些方法没有考虑到不同通道之间语义信息的差异。为解决上述问题，我们提出了一种有效的双通道自适应局部语义配准方法，它由通道语义配准模块（CSAM）和空间语义配准模块（SSAM）组成。具体来说，CSAM 基于两个子模块，即类感知注意力模块和目标感知注意力模块，自适应地生成通道权重，以突出辨别信息。CAM 强调支持集中每个类别的辨别语义细节，TAM 则增强查询图像的目标对象区域。在此基础上，SSAM 通过空间双向配准策略促进语义相关局部区域的有效配准。将两个自适应模块结合起来，可以更好地捕捉通道和空间两个维度的细粒度语义上下文信息，从而提高 FS-FGC 的准确性和鲁棒性。在三个广泛使用的细粒度分类数据集上的实验结果表明，FS-FGC 性能卓越，与目前的主流方法相比具有显著的竞争优势。代码见：https://github.com/kellyagya/DALSA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量