Semantic Guided Latent Parts Embedding for Few-Shot Learning

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI:10.1109/WACV56688.2023.00541

Fengyuan Yang, Ruiping Wang, Xilin Chen

{"title":"Semantic Guided Latent Parts Embedding for Few-Shot Learning","authors":"Fengyuan Yang, Ruiping Wang, Xilin Chen","doi":"10.1109/WACV56688.2023.00541","DOIUrl":null,"url":null,"abstract":"The ability of few-shot learning (FSL) is a basic requirement of intelligent agent learning in the open visual world. However, existing deep learning systems rely too heavily on large numbers of training samples, making it hard to learn new categories efficiently from limited size of training data. Two key challenges of FSL are insufficient comprehension and imperfect modeling of the few-shot novel class. For insufficient visual comprehension, semantic knowledge which is information from other modalities can help replenish the understanding of novel classes. But even so, most works still suffer from the second challenge because the single global class prototype they adopted is extremely unstable and imperfect given the larger intra-class variation and harder inter-class discrimination in FSL scenario. Thus, we propose to represent each class by its several different parts with the help of class semantic knowledge. Since we can never pre-define parts for unknown novel classes, we embed them in a latent manner. Concretely, we train a generator that takes the class semantic knowledge as input and outputs several filters of class-specific semantic latent parts. By applying each part filter, our model can pay attention to corresponding local regions containing each part. At the inference stage, the classification is conducted by comparing the similarities between those parts. Experiments on several FSL benchmarks demonstrate the effectiveness of our proposed method and show its potential to go beyond class recognition to class understanding. Furthermore, we also find when semantic knowledge is more visualized and customized, it will be more helpful in the FSL task.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The ability of few-shot learning (FSL) is a basic requirement of intelligent agent learning in the open visual world. However, existing deep learning systems rely too heavily on large numbers of training samples, making it hard to learn new categories efficiently from limited size of training data. Two key challenges of FSL are insufficient comprehension and imperfect modeling of the few-shot novel class. For insufficient visual comprehension, semantic knowledge which is information from other modalities can help replenish the understanding of novel classes. But even so, most works still suffer from the second challenge because the single global class prototype they adopted is extremely unstable and imperfect given the larger intra-class variation and harder inter-class discrimination in FSL scenario. Thus, we propose to represent each class by its several different parts with the help of class semantic knowledge. Since we can never pre-define parts for unknown novel classes, we embed them in a latent manner. Concretely, we train a generator that takes the class semantic knowledge as input and outputs several filters of class-specific semantic latent parts. By applying each part filter, our model can pay attention to corresponding local regions containing each part. At the inference stage, the classification is conducted by comparing the similarities between those parts. Experiments on several FSL benchmarks demonstrate the effectiveness of our proposed method and show its potential to go beyond class recognition to class understanding. Furthermore, we also find when semantic knowledge is more visualized and customized, it will be more helpful in the FSL task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义引导的少次学习潜件嵌入

在开放的视觉世界中，少镜头学习(FSL)能力是智能体学习的基本要求。然而，现有的深度学习系统过于依赖大量的训练样本，很难从有限的训练数据中有效地学习新的类别。FSL面临的两个主要挑战是对少数镜头小说课堂的理解不足和建模不完善。在视觉理解不足的情况下，语义知识(来自其他形式的信息)可以帮助补充对新类的理解。但即便如此，大多数作品仍然面临着第二种挑战，因为他们采用的单一全局类原型非常不稳定和不完善，因为在FSL场景中，类内变异更大，类间歧视更难。因此，我们建议在类语义知识的帮助下，用几个不同的部分来表示每个类。因为我们永远不能为未知的新类预先定义部分，所以我们以一种潜在的方式嵌入它们。具体来说，我们训练了一个生成器，它以类的语义知识作为输入，输出几个特定于类的语义潜在部分的过滤器。通过对每个部分进行滤波，我们的模型可以关注包含每个部分的相应局部区域。在推理阶段，通过比较这些部分之间的相似度进行分类。在几个FSL基准上的实验证明了我们提出的方法的有效性，并显示了它超越类识别到类理解的潜力。此外，我们还发现语义知识越可视化和定制化，它对FSL任务的帮助越大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量

期刊最新文献

Aggregating Bilateral Attention for Few-Shot Instance Localization Burst Reflection Removal using Reflection Motion Aggregation Cues Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection Efficient Skeleton-Based Action Recognition via Joint-Mapping strategies Few-shot Object Detection via Improved Classification Features