Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI:10.1109/TMM.2024.3521702

Liangchen Liu;Nannan Wang;Dawei Zhou;Decheng Liu;Xi Yang;Xinbo Gao;Tongliang Liu

{"title":"Generalizable Prompt Learning via Gradient Constrained Sharpness-Aware Minimization","authors":"Liangchen Liu;Nannan Wang;Dawei Zhou;Decheng Liu;Xi Yang;Xinbo Gao;Tongliang Liu","doi":"10.1109/TMM.2024.3521702","DOIUrl":null,"url":null,"abstract":"This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comparing with existing generalizable methods that neglect the seen classes degradation, the setting of this problem is stricter and fits more closely with practical applications. To solve this problem, we start from the optimization perspective, and leverage the relationship between loss landscape geometry and model generalization ability. By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both <bold>loss value</b> and <bold>loss sharpness</b>, while each of them is indispensable. However, we find the optimizing gradient of existing methods cannot maintain high relevance to both loss value and loss sharpness during optimization, which severely affects their trade-off performance. To this end, we propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp), to dynamically constrain the optimizing gradient, thus achieving above two-fold optimization objective simultaneously. Extensive experiments verify the effectiveness of GCSCoOp in the trade-off problem.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1100-1113"},"PeriodicalIF":9.7000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814656/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comparing with existing generalizable methods that neglect the seen classes degradation, the setting of this problem is stricter and fits more closely with practical applications. To solve this problem, we start from the optimization perspective, and leverage the relationship between loss landscape geometry and model generalization ability. By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness, while each of them is indispensable. However, we find the optimizing gradient of existing methods cannot maintain high relevance to both loss value and loss sharpness during optimization, which severely affects their trade-off performance. To this end, we propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp), to dynamically constrain the optimizing gradient, thus achieving above two-fold optimization objective simultaneously. Extensive experiments verify the effectiveness of GCSCoOp in the trade-off problem.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于梯度约束的锐度感知最小化的可泛化提示学习

本文针对视觉语言模型（VLM）泛化提示学习中一个新的权衡问题，即在保持可见类性能的同时提高未见类的性能。与现有的忽略类退化的泛化方法相比，该问题的设定更严格，更符合实际应用。为了解决这一问题，我们从优化的角度出发，利用损失景观几何与模型泛化能力之间的关系。通过分析最先进的方法和普通的基于锐度感知最小化（SAM）的方法的损失情况，我们得出结论，权衡性能与损失值和损失锐度相关，而两者都是不可或缺的。然而，我们发现现有方法的优化梯度在优化过程中不能同时与损失值和损失锐度保持高度的相关性，严重影响了它们的权衡性能。为此，我们提出了一种新的基于sam的快速学习方法，称为梯度约束锐度感知上下文优化（GCSCoOp），对优化梯度进行动态约束，从而同时实现上述双重优化目标。大量的实验验证了GCSCoOp在权衡问题中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.