Inversed Pyramid Network with Spatial-adapted and Task-oriented Tuning for few-shot learning

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-08-01 Epub Date: 2025-02-20 DOI:10.1016/j.patcog.2025.111415

Xiaowei Zhao , Duorui Wang , Shihao Bai , Shuo Wang , Yajun Gao , Yu Liang , Yuqing Ma , Xianglong Liu

{"title":"Inversed Pyramid Network with Spatial-adapted and Task-oriented Tuning for few-shot learning","authors":"Xiaowei Zhao , Duorui Wang , Shihao Bai , Shuo Wang , Yajun Gao , Yu Liang , Yuqing Ma , Xianglong Liu","doi":"10.1016/j.patcog.2025.111415","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of artificial intelligence, deep neural networks have achieved great performance in many tasks. However, traditional deep learning methods require a large amount of training data, which may not be available in certain practical scenarios. In contrast, few-shot learning aims to learn a model that can be readily adapted to new unseen classes from only one or a few labeled examples. Despite this success, most existing methods rely on pre-trained feature extractor networks trained with global features, ignoring the discrimination of local features, and weak generalization capabilities limit their performance. To address the problem, according to the human’s coarse-to-fine cognition paradigm, we propose an Inverted Pyramid Network with Spatial-adapted and Task-oriented Tuning (TIPN) for few-shot learning. Specifically, the proposed framework represents local features for categories that are difficult to distinguish by global features and recognizes objects from both global and local perspectives. Moreover, to ensure the calibration validity of the proposed model at the local stage, we introduce the Spatial-adapted Layer to preserve the discriminative global representation ability of the pre-trained backbone network. Meanwhile, as the representations extracted from the past categories are not applicable to the current new tasks, we further propose the Task-oriented Tuning strategy to adjust the parameters of the Batch Normalization layer in the pre-trained feature extractor network, to explicitly transfer knowledge from base classes to novel classes according to the support samples of each task. Extensive experiments conducted on multiple benchmark datasets demonstrate that our method can significantly outperform many state-of-the-art few-shot learning methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111415"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325000755","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid development of artificial intelligence, deep neural networks have achieved great performance in many tasks. However, traditional deep learning methods require a large amount of training data, which may not be available in certain practical scenarios. In contrast, few-shot learning aims to learn a model that can be readily adapted to new unseen classes from only one or a few labeled examples. Despite this success, most existing methods rely on pre-trained feature extractor networks trained with global features, ignoring the discrimination of local features, and weak generalization capabilities limit their performance. To address the problem, according to the human’s coarse-to-fine cognition paradigm, we propose an Inverted Pyramid Network with Spatial-adapted and Task-oriented Tuning (TIPN) for few-shot learning. Specifically, the proposed framework represents local features for categories that are difficult to distinguish by global features and recognizes objects from both global and local perspectives. Moreover, to ensure the calibration validity of the proposed model at the local stage, we introduce the Spatial-adapted Layer to preserve the discriminative global representation ability of the pre-trained backbone network. Meanwhile, as the representations extracted from the past categories are not applicable to the current new tasks, we further propose the Task-oriented Tuning strategy to adjust the parameters of the Batch Normalization layer in the pre-trained feature extractor network, to explicitly transfer knowledge from base classes to novel classes according to the support samples of each task. Extensive experiments conducted on multiple benchmark datasets demonstrate that our method can significantly outperform many state-of-the-art few-shot learning methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于空间自适应和面向任务的反金字塔网络的少镜头学习

随着人工智能的快速发展，深度神经网络在许多任务中取得了优异的成绩。然而，传统的深度学习方法需要大量的训练数据，在某些实际场景中可能无法使用。相比之下，少射学习的目的是学习一个模型，这个模型可以很容易地从一个或几个标记的例子中适应新的看不见的类。尽管取得了成功，但大多数现有方法依赖于使用全局特征训练的预训练特征提取器网络，忽略了局部特征的识别，并且较弱的泛化能力限制了它们的性能。为了解决这一问题，根据人类从粗到精的认知范式，我们提出了一个具有空间适应和任务导向调谐（TIPN）的倒金字塔网络，用于少镜头学习。具体而言，所提出的框架表示难以通过全局特征区分的类别的局部特征，并从全局和局部角度识别对象。此外，为了确保模型在局部阶段的校准有效性，我们引入了空间自适应层，以保持预训练骨干网的判别全局表示能力。同时，由于从过去的类别中提取的表示不适用于当前的新任务，我们进一步提出了面向任务的调优策略，调整预训练的特征提取器网络中批处理归一化层的参数，根据每个任务的支持样本显式地将知识从基类转移到新类。在多个基准数据集上进行的大量实验表明，我们的方法可以显著优于许多最先进的少镜头学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.