领域提示学习方法使CLIP有效适应未知领域

Xin Zhang, Shixiang Shane Gu, Yutaka Matsuo, Yusuke Iwasawa
{"title":"领域提示学习方法使CLIP有效适应未知领域","authors":"Xin Zhang, Shixiang Shane Gu, Yutaka Matsuo, Yusuke Iwasawa","doi":"10.1527/tjsai.38-6_b-mc2","DOIUrl":null,"url":null,"abstract":"Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways to adopt contrastive languageimage pre-training (CLIP), a visual-language foundation model, for DG problems in image classification. While empirical risk minimization (ERM) greatly improves the accuracy with bigger backbones and training datasets using standard DG benchmarks, fine-tuning FMs is not practical in many real-world situations. We propose Domain Prompt Learning (DPL) as a novel approach for domain inference in the form of conditional prompt generation. DPL achieved a significant accuracy improvement with only training a lightweight prompt generator (a three-layer MLP), whose parameter is of equivalent scale to the classification projector in the previous DG literature. Combining DPL with CLIP provides surprising performance, raising the accuracy of zero-shot CLIP from 73.7% to 79.3% on several standard datasets, namely PACS, VLCS, OfficeHome, and TerraIncognita. We hope the simplicity and success of our approach lead to broader adoption and analysis of foundation models in the domain generalization field.","PeriodicalId":23256,"journal":{"name":"Transactions of The Japanese Society for Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains\",\"authors\":\"Xin Zhang, Shixiang Shane Gu, Yutaka Matsuo, Yusuke Iwasawa\",\"doi\":\"10.1527/tjsai.38-6_b-mc2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways to adopt contrastive languageimage pre-training (CLIP), a visual-language foundation model, for DG problems in image classification. While empirical risk minimization (ERM) greatly improves the accuracy with bigger backbones and training datasets using standard DG benchmarks, fine-tuning FMs is not practical in many real-world situations. We propose Domain Prompt Learning (DPL) as a novel approach for domain inference in the form of conditional prompt generation. DPL achieved a significant accuracy improvement with only training a lightweight prompt generator (a three-layer MLP), whose parameter is of equivalent scale to the classification projector in the previous DG literature. Combining DPL with CLIP provides surprising performance, raising the accuracy of zero-shot CLIP from 73.7% to 79.3% on several standard datasets, namely PACS, VLCS, OfficeHome, and TerraIncognita. We hope the simplicity and success of our approach lead to broader adoption and analysis of foundation models in the domain generalization field.\",\"PeriodicalId\":23256,\"journal\":{\"name\":\"Transactions of The Japanese Society for Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions of The Japanese Society for Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1527/tjsai.38-6_b-mc2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of The Japanese Society for Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1527/tjsai.38-6_b-mc2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

领域泛化(DG)是一个复杂的迁移学习问题,旨在学习未知领域的可泛化模型。最近的基础模型(FMs)对许多分布变化都具有鲁棒性,因此,应该从本质上提高DG的性能。在这项工作中,我们研究了采用对比语言图像预训练(CLIP)的通用方法,这是一种视觉语言基础模型,用于图像分类中的DG问题。虽然经验风险最小化(ERM)使用标准DG基准极大地提高了大型骨干和训练数据集的准确性,但微调FMs在许多实际情况下并不实用。我们提出领域提示学习(DPL)作为一种以条件提示生成形式进行领域推理的新方法。DPL只需要训练一个轻量级的提示生成器(三层MLP),其参数与之前DG文献中的分类投影仪的规模相当,就可以显著提高DPL的精度。DPL与CLIP的结合提供了令人惊讶的性能,在几个标准数据集(即PACS, VLCS, OfficeHome和TerraIncognita)上将零射击CLIP的准确率从73.7%提高到79.3%。我们希望我们的方法的简单性和成功导致基础模型在领域泛化领域得到更广泛的采用和分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains
Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways to adopt contrastive languageimage pre-training (CLIP), a visual-language foundation model, for DG problems in image classification. While empirical risk minimization (ERM) greatly improves the accuracy with bigger backbones and training datasets using standard DG benchmarks, fine-tuning FMs is not practical in many real-world situations. We propose Domain Prompt Learning (DPL) as a novel approach for domain inference in the form of conditional prompt generation. DPL achieved a significant accuracy improvement with only training a lightweight prompt generator (a three-layer MLP), whose parameter is of equivalent scale to the classification projector in the previous DG literature. Combining DPL with CLIP provides surprising performance, raising the accuracy of zero-shot CLIP from 73.7% to 79.3% on several standard datasets, namely PACS, VLCS, OfficeHome, and TerraIncognita. We hope the simplicity and success of our approach lead to broader adoption and analysis of foundation models in the domain generalization field.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Transactions of The Japanese Society for Artificial Intelligence
Transactions of The Japanese Society for Artificial Intelligence Computer Science-Artificial Intelligence
CiteScore
0.40
自引率
0.00%
发文量
36
期刊最新文献
人流データを用いたサプライチェーン異常指数の構築と要因分解手法の開発 An Ontology of Properties and Processes of Inorganic Materials Based on Context-Dependency and Its Use Construction of a Dataset for Extracting the Relationship between Text and Tables for Securities Reports Analysis of Hedging Strategies for Multiple Options in the BTC Market Using Deep Smoothing and Deep Hedging Information Value of Japanese Financial Results Briefings Using Text Mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1