Exploring Universal Intrinsic Task Subspace for Few-Shot Learning via Prompt Tuning

IF 4.1 2区计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-07-18 DOI:10.1109/TASLP.2024.3430545

Yujia Qin;Xiaozhi Wang;Yusheng Su;Yankai Lin;Ning Ding;Jing Yi;Weize Chen;Zhiyuan Liu;Juanzi Li;Lei Hou;Peng Li;Maosong Sun;Jie Zhou

{"title":"Exploring Universal Intrinsic Task Subspace for Few-Shot Learning via Prompt Tuning","authors":"Yujia Qin;Xiaozhi Wang;Yusheng Su;Yankai Lin;Ning Ding;Jing Yi;Weize Chen;Zhiyuan Liu;Juanzi Li;Lei Hou;Peng Li;Maosong Sun;Jie Zhou","doi":"10.1109/TASLP.2024.3430545","DOIUrl":null,"url":null,"abstract":"Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs to various few-shot tasks can be reparameterized as optimizing only a few free parameters in a unified low-dimensional \n<italic>intrinsic task subspace</i>\n, which may help us understand why PLMs could easily adapt to various NLP tasks with small-scale data. To find such a subspace and examine its universality, we propose an analysis pipeline called \n<italic>intrinsic prompt tuning</i>\n (IPT). Specifically, we resort to the recent success of prompt tuning and decompose the soft prompts of multiple NLP tasks into the same low-dimensional nonlinear subspace, then we learn to adapt the PLM to unseen data or tasks by only tuning parameters in this subspace. In the experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 250-dimensional subspace found with 100 tasks, by only tuning 250 free parameters, we can recover 97% and 83% of the full prompt tuning performance for 100 seen tasks (using different training data) and 20 unseen tasks, respectively, showing great generalization ability of the found intrinsic task subspace. Besides being an analysis tool, IPTcould further help us improve the prompt tuning stability.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3631-3643"},"PeriodicalIF":4.1000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10603438","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10603438/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to broad NLP tasks differing a lot superficially? In this work, we empirically find evidence indicating that the adaptations of PLMs to various few-shot tasks can be reparameterized as optimizing only a few free parameters in a unified low-dimensional intrinsic task subspace , which may help us understand why PLMs could easily adapt to various NLP tasks with small-scale data. To find such a subspace and examine its universality, we propose an analysis pipeline called intrinsic prompt tuning (IPT). Specifically, we resort to the recent success of prompt tuning and decompose the soft prompts of multiple NLP tasks into the same low-dimensional nonlinear subspace, then we learn to adapt the PLM to unseen data or tasks by only tuning parameters in this subspace. In the experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 250-dimensional subspace found with 100 tasks, by only tuning 250 free parameters, we can recover 97% and 83% of the full prompt tuning performance for 100 seen tasks (using different training data) and 20 unseen tasks, respectively, showing great generalization ability of the found intrinsic task subspace. Besides being an analysis tool, IPTcould further help us improve the prompt tuning stability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过提示调整探索通用固有任务子空间，实现快速学习

为什么预训练语言模型（PLMs）可以学习通用表征并有效适应表面上差异很大的广泛 NLP 任务？在这项工作中，我们通过实证研究发现，PLM 对各种少量任务的适应可以重新参数化为在一个统一的低维内在任务子空间中仅优化几个自由参数，这可能有助于我们理解为什么 PLM 可以轻松地适应各种 NLP 任务的小规模数据。为了找到这样一个子空间并研究其普遍性，我们提出了一个名为 "内在提示调整（IPT）"的分析管道。具体来说，我们借鉴了最近在提示调谐方面取得的成功，将多个 NLP 任务的软提示分解为同一个低维非线性子空间，然后只需在该子空间中调谐参数，就能学会使 PLM 适应未见数据或任务。在实验中，我们研究了不同的少量 NLP 任务，结果令人惊讶地发现，在一个由 100 个任务组成的 250 维子空间中，只需调整 250 个自由参数，我们就能在 100 个已见任务（使用不同的训练数据）和 20 个未见任务中分别恢复 97% 和 83% 的完整提示调整性能，这表明所发现的内在任务子空间具有很强的泛化能力。除了作为一种分析工具，IPTc 还能进一步帮助我们提高即时调谐的稳定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

11.30

自引率

11.10%

发文量

217

期刊介绍： The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.