MuAP：缺失模态视觉语言模型的多步自适应提示学习

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-07 DOI:arxiv-2409.04693

Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang

{"title":"MuAP：缺失模态视觉语言模型的多步自适应提示学习","authors":"Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang","doi":"arxiv-2409.04693","DOIUrl":null,"url":null,"abstract":"Recently, prompt learning has garnered considerable attention for its success\nin various Vision-Language (VL) tasks. However, existing prompt-based models\nare primarily focused on studying prompt generation and prompt strategies with\ncomplete modality settings, which does not accurately reflect real-world\nscenarios where partial modality information may be missing. In this paper, we\npresent the first comprehensive investigation into prompt learning behavior\nwhen modalities are incomplete, revealing the high sensitivity of prompt-based\nmodels to missing modalities. To this end, we propose a novel Multi-step\nAdaptive Prompt Learning (MuAP) framework, aiming to generate multimodal\nprompts and perform multi-step prompt tuning, which adaptively learns knowledge\nby iteratively aligning modalities. Specifically, we generate multimodal\nprompts for each modality and devise prompt strategies to integrate them into\nthe Transformer model. Subsequently, we sequentially perform prompt tuning from\nsingle-stage and alignment-stage, allowing each modality-prompt to be\nautonomously and adaptively learned, thereby mitigating the imbalance issue\ncaused by only textual prompts that are learnable in previous works. Extensive\nexperiments demonstrate the effectiveness of our MuAP and this model achieves\nsignificant improvements compared to the state-of-the-art on all benchmark\ndatasets","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality\",\"authors\":\"Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang\",\"doi\":\"arxiv-2409.04693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, prompt learning has garnered considerable attention for its success\\nin various Vision-Language (VL) tasks. However, existing prompt-based models\\nare primarily focused on studying prompt generation and prompt strategies with\\ncomplete modality settings, which does not accurately reflect real-world\\nscenarios where partial modality information may be missing. In this paper, we\\npresent the first comprehensive investigation into prompt learning behavior\\nwhen modalities are incomplete, revealing the high sensitivity of prompt-based\\nmodels to missing modalities. To this end, we propose a novel Multi-step\\nAdaptive Prompt Learning (MuAP) framework, aiming to generate multimodal\\nprompts and perform multi-step prompt tuning, which adaptively learns knowledge\\nby iteratively aligning modalities. Specifically, we generate multimodal\\nprompts for each modality and devise prompt strategies to integrate them into\\nthe Transformer model. Subsequently, we sequentially perform prompt tuning from\\nsingle-stage and alignment-stage, allowing each modality-prompt to be\\nautonomously and adaptively learned, thereby mitigating the imbalance issue\\ncaused by only textual prompts that are learnable in previous works. Extensive\\nexperiments demonstrate the effectiveness of our MuAP and this model achieves\\nsignificant improvements compared to the state-of-the-art on all benchmark\\ndatasets\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，提示学习因其在各种视觉语言（VL）任务中的成功应用而备受关注。然而，现有的基于提示的模型主要侧重于研究完整模态设置下的提示生成和提示策略，这并不能准确反映部分模态信息可能缺失的真实世界场景。在本文中，我们首次对模态不完整时的提示学习行为进行了全面研究，揭示了基于提示的模型对模态缺失的高度敏感性。为此，我们提出了一个新颖的多步自适应提示学习（MuAP）框架，旨在生成多模态提示并执行多步提示调整，通过迭代调整模态来自适应地学习知识。具体来说，我们为每种模态生成多模态提示，并设计提示策略将其整合到变形器模型中。随后，我们依次从单个阶段和对齐阶段进行提示调整，使每种模态提示都能美化并自适应地学习，从而缓解了以往工作中只有文本提示可以学习所造成的不平衡问题。广泛的实验证明了我们的 MuAP 的有效性，在所有基准数据集上，该模型都比最先进的模型取得了显著的改进

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the first comprehensive investigation into prompt learning behavior when modalities are incomplete, revealing the high sensitivity of prompt-based models to missing modalities. To this end, we propose a novel Multi-step Adaptive Prompt Learning (MuAP) framework, aiming to generate multimodal prompts and perform multi-step prompt tuning, which adaptively learns knowledge by iteratively aligning modalities. Specifically, we generate multimodal prompts for each modality and devise prompt strategies to integrate them into the Transformer model. Subsequently, we sequentially perform prompt tuning from single-stage and alignment-stage, allowing each modality-prompt to be autonomously and adaptively learned, thereby mitigating the imbalance issue caused by only textual prompts that are learnable in previous works. Extensive experiments demonstrate the effectiveness of our MuAP and this model achieves significant improvements compared to the state-of-the-art on all benchmark datasets

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量