Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms

Fatemeh Askari, Amirreza Fateh, Mohammad Reza Mohammadi
{"title":"Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms","authors":"Fatemeh Askari, Amirreza Fateh, Mohammad Reza Mohammadi","doi":"arxiv-2409.07989","DOIUrl":null,"url":null,"abstract":"In the context of few-shot classification, the goal is to train a classifier\nusing a limited number of samples while maintaining satisfactory performance.\nHowever, traditional metric-based methods exhibit certain limitations in\nachieving this objective. These methods typically rely on a single distance\nvalue between the query feature and support feature, thereby overlooking the\ncontribution of shallow features. To overcome this challenge, we propose a\nnovel approach in this paper. Our approach involves utilizing multi-output\nembedding network that maps samples into distinct feature spaces. The proposed\nmethod extract feature vectors at different stages, enabling the model to\ncapture both global and abstract features. By utilizing these diverse feature\nspaces, our model enhances its performance. Moreover, employing a\nself-attention mechanism improves the refinement of features at each stage,\nleading to even more robust representations and improved overall performance.\nFurthermore, assigning learnable weights to each stage significantly improved\nperformance and results. We conducted comprehensive evaluations on the\nMiniImageNet and FC100 datasets, specifically in the 5-way 1-shot and 5-way\n5-shot scenarios. Additionally, we performed a cross-domain task from\nMiniImageNet to the CUB dataset, achieving high accuracy in the testing domain.\nThese evaluations demonstrate the efficacy of our proposed method in comparison\nto state-of-the-art approaches. https://github.com/FatemehAskari/MSENet","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the context of few-shot classification, the goal is to train a classifier using a limited number of samples while maintaining satisfactory performance. However, traditional metric-based methods exhibit certain limitations in achieving this objective. These methods typically rely on a single distance value between the query feature and support feature, thereby overlooking the contribution of shallow features. To overcome this challenge, we propose a novel approach in this paper. Our approach involves utilizing multi-output embedding network that maps samples into distinct feature spaces. The proposed method extract feature vectors at different stages, enabling the model to capture both global and abstract features. By utilizing these diverse feature spaces, our model enhances its performance. Moreover, employing a self-attention mechanism improves the refinement of features at each stage, leading to even more robust representations and improved overall performance. Furthermore, assigning learnable weights to each stage significantly improved performance and results. We conducted comprehensive evaluations on the MiniImageNet and FC100 datasets, specifically in the 5-way 1-shot and 5-way 5-shot scenarios. Additionally, we performed a cross-domain task from MiniImageNet to the CUB dataset, achieving high accuracy in the testing domain. These evaluations demonstrate the efficacy of our proposed method in comparison to state-of-the-art approaches. https://github.com/FatemehAskari/MSENet
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过可学习的多尺度嵌入和注意力机制增强少镜头图像分类能力
然而,传统的基于度量的方法在实现这一目标时表现出一定的局限性。这些方法通常依赖于查询特征和支持特征之间的单一距离值,从而忽略了浅层特征的贡献。为了克服这一挑战,我们在本文中提出了一种新的方法。我们的方法涉及利用多输出嵌入网络,将样本映射到不同的特征空间。所提出的方法在不同阶段提取特征向量,使模型能够捕捉全局特征和抽象特征。通过利用这些不同的特征空间,我们的模型提高了性能。此外,采用自我关注机制可以改进每个阶段的特征提取,从而获得更稳健的表征和更高的整体性能。我们在MiniImageNet和FC100数据集上进行了全面评估,特别是在5路1拍和5路5拍场景中。此外,我们还完成了从MiniImageNet到CUB数据集的跨领域任务,并在测试领域取得了很高的准确率。这些评估结果表明,与最先进的方法相比,我们提出的方法非常有效。https://github.com/FatemehAskari/MSENet。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1