Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.07989

Fatemeh Askari, Amirreza Fateh, Mohammad Reza Mohammadi

{"title":"Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms","authors":"Fatemeh Askari, Amirreza Fateh, Mohammad Reza Mohammadi","doi":"arxiv-2409.07989","DOIUrl":null,"url":null,"abstract":"In the context of few-shot classification, the goal is to train a classifier\nusing a limited number of samples while maintaining satisfactory performance.\nHowever, traditional metric-based methods exhibit certain limitations in\nachieving this objective. These methods typically rely on a single distance\nvalue between the query feature and support feature, thereby overlooking the\ncontribution of shallow features. To overcome this challenge, we propose a\nnovel approach in this paper. Our approach involves utilizing multi-output\nembedding network that maps samples into distinct feature spaces. The proposed\nmethod extract feature vectors at different stages, enabling the model to\ncapture both global and abstract features. By utilizing these diverse feature\nspaces, our model enhances its performance. Moreover, employing a\nself-attention mechanism improves the refinement of features at each stage,\nleading to even more robust representations and improved overall performance.\nFurthermore, assigning learnable weights to each stage significantly improved\nperformance and results. We conducted comprehensive evaluations on the\nMiniImageNet and FC100 datasets, specifically in the 5-way 1-shot and 5-way\n5-shot scenarios. Additionally, we performed a cross-domain task from\nMiniImageNet to the CUB dataset, achieving high accuracy in the testing domain.\nThese evaluations demonstrate the efficacy of our proposed method in comparison\nto state-of-the-art approaches. https://github.com/FatemehAskari/MSENet","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"64 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of few-shot classification, the goal is to train a classifier using a limited number of samples while maintaining satisfactory performance. However, traditional metric-based methods exhibit certain limitations in achieving this objective. These methods typically rely on a single distance value between the query feature and support feature, thereby overlooking the contribution of shallow features. To overcome this challenge, we propose a novel approach in this paper. Our approach involves utilizing multi-output embedding network that maps samples into distinct feature spaces. The proposed method extract feature vectors at different stages, enabling the model to capture both global and abstract features. By utilizing these diverse feature spaces, our model enhances its performance. Moreover, employing a self-attention mechanism improves the refinement of features at each stage, leading to even more robust representations and improved overall performance. Furthermore, assigning learnable weights to each stage significantly improved performance and results. We conducted comprehensive evaluations on the MiniImageNet and FC100 datasets, specifically in the 5-way 1-shot and 5-way 5-shot scenarios. Additionally, we performed a cross-domain task from MiniImageNet to the CUB dataset, achieving high accuracy in the testing domain. These evaluations demonstrate the efficacy of our proposed method in comparison to state-of-the-art approaches. https://github.com/FatemehAskari/MSENet

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过可学习的多尺度嵌入和注意力机制增强少镜头图像分类能力

然而，传统的基于度量的方法在实现这一目标时表现出一定的局限性。这些方法通常依赖于查询特征和支持特征之间的单一距离值，从而忽略了浅层特征的贡献。为了克服这一挑战，我们在本文中提出了一种新的方法。我们的方法涉及利用多输出嵌入网络，将样本映射到不同的特征空间。所提出的方法在不同阶段提取特征向量，使模型能够捕捉全局特征和抽象特征。通过利用这些不同的特征空间，我们的模型提高了性能。此外，采用自我关注机制可以改进每个阶段的特征提取，从而获得更稳健的表征和更高的整体性能。我们在MiniImageNet和FC100数据集上进行了全面评估，特别是在5路1拍和5路5拍场景中。此外，我们还完成了从MiniImageNet到CUB数据集的跨领域任务，并在测试领域取得了很高的准确率。这些评估结果表明，与最先进的方法相比，我们提出的方法非常有效。https://github.com/FatemehAskari/MSENet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey