pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning

Proceedings of the ACM Web Conference 2023 Pub Date : 2023-04-30 DOI:10.1145/3543507.3583518

Tao Guo, Song Guo, Junxiao Wang

{"title":"pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning","authors":"Tao Guo, Song Guo, Junxiao Wang","doi":"10.1145/3543507.3583518","DOIUrl":null,"url":null,"abstract":"Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user’s local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Web Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543507.3583518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user’s local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

pFedPrompt:联邦学习中视觉语言模型的个性化学习提示

像CLIP这样的预训练视觉语言模型在学习捕捉用户潜在特征的表示方面显示出巨大的潜力。最近提出的一种称为上下文优化(CoOp)的方法引入了训练提示的概念，以适应预训练的视觉语言模型。考虑到该方法的轻量级特性，研究人员已经将范式从集中式系统迁移到分散式系统，以创新联邦学习(FL)的协作训练框架。然而，目前的提示训练主要集中在用户共识的建模上，缺乏对用户特征的适应，提示的个性化在很大程度上没有得到充分的探索。在过去的几年里，研究人员将个性化FL (pFL)方法应用于异构用户的自定义模型。不幸的是，我们发现随着训练方式和训练行为的变化，直接使用pFL方法来提示训练会导致个性化和绩效不足。为了弥补这一差距，我们提出了pFedPrompt，它利用了视觉语言模型中多模态的独特优势，从语言空间中学习用户共识，并以非参数方式适应视觉空间中的用户特征。通过这种双重协作，学习提示将完全个性化，并与用户的本地特征保持一致。我们在具有统计异质性的FL设置下对各种数据集进行了广泛的实验。结果证明了我们的pFedPrompt相对于其他具有鲁棒性能的方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM Web Conference 2023

自引率

0.00%

发文量

期刊最新文献

CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing Learning to Simulate Crowd Trajectories with Graph Networks Word Sense Disambiguation by Refining Target Word Embedding Curriculum Graph Poisoning Optimizing Guided Traversal for Fast Learned Sparse Retrieval