评估通过参数效率微调方法训练的参数矩阵的可移植性

Findings Pub Date : 2024-01-25 DOI:10.48550/arXiv.2401.14228

Mohammed Sabry, Anya Belz

{"title":"评估通过参数效率微调方法训练的参数矩阵的可移植性","authors":"Mohammed Sabry, Anya Belz","doi":"10.48550/arXiv.2401.14228","DOIUrl":null,"url":null,"abstract":"As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning.In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module.We find that the ported modules far outperform the two alternatives tested, but that there are interesting differences between the four PEFT techniques tested.We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"30 3","pages":"1548-1556"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods\",\"authors\":\"Mohammed Sabry, Anya Belz\",\"doi\":\"10.48550/arXiv.2401.14228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning.In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module.We find that the ported modules far outperform the two alternatives tested, but that there are interesting differences between the four PEFT techniques tested.We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.\",\"PeriodicalId\":508951,\"journal\":{\"name\":\"Findings\",\"volume\":\"30 3\",\"pages\":\"1548-1556\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Findings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2401.14228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.14228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着训练越来越大的语言模型所需的成本越来越高，人们对重复使用以前所学知识的兴趣也越来越大。迁移学习方法表明，重复使用非特定任务的知识有助于后续特定任务的学习。在本文中，我们研究了反向迁移：将编码特定任务知识的整个功能模块从一个模型移植到另一个模型。我们设计了一项包含 1,440 次训练/测试运行的研究，以情感分析为例，测试通过参数高效微调（PEFT）技术训练的模块的可移植性。我们在广泛的场景中测试了可移植性，包括不同的 PEFT 技术和不同的预训练主机模型等。我们将移植模块的性能与(i)从零开始训练的同等模块和(ii)从与移植模块相同的分布中采样的参数训练的同等模块的性能进行了比较。我们发现移植模块的性能远远优于所测试的两种替代方案，但所测试的四种 PEFT 技术之间存在有趣的差异。我们的结论是，由 PEFT 技术产生的以结构模块化参数集为形式的特定任务知识具有高度可移植性，但成功程度取决于 PEFT 的类型以及原始模型和接收预训练模型之间的差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods

As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning.In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module.We find that the ported modules far outperform the two alternatives tested, but that there are interesting differences between the four PEFT techniques tested.We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Findings

自引率

0.00%

发文量