Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models

Xun Wu, Guolong Wang, Zhaoyuan Liu, Xuan Dang, Zheng Qin
{"title":"Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models","authors":"Xun Wu, Guolong Wang, Zhaoyuan Liu, Xuan Dang, Zheng Qin","doi":"10.1109/ICASSP49357.2023.10095231","DOIUrl":null,"url":null,"abstract":"In recent years, learnable prompts have emerged as a major prompt learning paradigm, enhancing the performance of large-scale vision-language pre-trained models in few-shot image classification. However, enhancing methods are often time-consuming and inflexible because 1) class-specific prompts are inefficient in certain situations; 2) instance-specific prompts are put in a fixed position. To address these issues, inspired by the coarse-to-fine decision-making paradigm of human, we propose an Instance-Aware Hierarchical-Structured Policy (IAHSP) that integrates instance-specific prompt selection and appropriate position selection using a reinforcement learning fashion. Specifically, IAHSP consists of two sub-policies: 1) the root policy selects the most suitable prompt from the prompts pool, and 2) the leaf policy identifies the optimal position for inserting the selected prompt. We train these two policies iteratively with rewards constraining the prompts while maintaining their diversity. Extensive experiments on 11 public benchmarks demonstrate that our IAHSP significantly boosts the few-shot image classification performance of vision-language pre-trained models, while also exhibiting superior generalization performance.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10095231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, learnable prompts have emerged as a major prompt learning paradigm, enhancing the performance of large-scale vision-language pre-trained models in few-shot image classification. However, enhancing methods are often time-consuming and inflexible because 1) class-specific prompts are inefficient in certain situations; 2) instance-specific prompts are put in a fixed position. To address these issues, inspired by the coarse-to-fine decision-making paradigm of human, we propose an Instance-Aware Hierarchical-Structured Policy (IAHSP) that integrates instance-specific prompt selection and appropriate position selection using a reinforcement learning fashion. Specifically, IAHSP consists of two sub-policies: 1) the root policy selects the most suitable prompt from the prompts pool, and 2) the leaf policy identifies the optimal position for inserting the selected prompt. We train these two policies iteratively with rewards constraining the prompts while maintaining their diversity. Extensive experiments on 11 public benchmarks demonstrate that our IAHSP significantly boosts the few-shot image classification performance of vision-language pre-trained models, while also exhibiting superior generalization performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视觉语言模型中用于快速学习的实例感知分层结构策略
近年来,可学习提示作为一种主要的提示学习范式出现,增强了大规模视觉语言预训练模型在少量图像分类中的性能。然而,增强方法通常既耗时又不灵活,因为1)特定类的提示在某些情况下效率低下;2)将特定实例的提示放在固定位置。为了解决这些问题,受人类从粗到精的决策范式的启发,我们提出了一种实例感知层次结构策略(IAHSP),该策略使用强化学习的方式集成了特定于实例的提示选择和适当的位置选择。具体来说,IAHSP由两个子策略组成:1)根策略从提示池中选择最合适的提示,2)叶策略确定插入所选提示的最佳位置。我们迭代地训练这两个策略,在保持提示多样性的同时,用奖励约束提示。在11个公共基准测试上的大量实验表明,我们的IAHSP显著提高了视觉语言预训练模型的少镜头图像分类性能,同时也表现出优异的泛化性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lightweight Machine Learning for Seizure Detection on Wearable Devices MSN-net: Multi-Scale Normality Network for Video Anomaly Detection ITER-SIS: Robust Unlimited Sampling Via Iterative Signal Sieving Streaming Multi-Channel Speech Separation with Online Time-Domain Generalized Wiener Filter MMATR: A Lightweight Approach for Multimodal Sentiment Analysis Based on Tensor Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1