发现参数高效微调的长期影响

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-24 DOI:arxiv-2409.06706

Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang

{"title":"发现参数高效微调的长期影响","authors":"Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang","doi":"arxiv-2409.06706","DOIUrl":null,"url":null,"abstract":"Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern\nrecognition capabilities and share extensive similarities with the human brain,\nspecifically Biological Neural Networks (BNNs). We are particularly intrigued\nby these models' ability to acquire new knowledge through fine-tuning. In this\nregard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption\nas a substitute for full fine-tuning due to its cost reduction in training and\nmitigation of over-fitting risks by limiting the number of trainable parameters\nduring adaptation. Since both ANNs and BNNs propagate information\nlayer-by-layer, a common analogy can be drawn: weights in ANNs represent\nsynapses in BNNs, while features (also known as latent variables or logits) in\nANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT\nmethods aim to adjust feature or parameter values using only a limited number\nof trainable parameters (usually less than 1% of the total parameters), yet\nachieve surprisingly good results. Building upon this clue, we delve deeper\ninto exploring the connections between feature adjustment and parameter\nadjustment, resulting in our proposed method Synapses & Neurons (SAN) that\nlearns scaling matrices for features and propagates their effects towards\nposterior weight matrices. Our approach draws strong inspiration from\nwell-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term\nDepression (LTD), which also reveal the relationship between synapse\ndevelopment and neurotransmitter release levels. We conducted extensive\ncomparisons of PEFT on 26 datasets using attention-based networks as well as\nconvolution-based networks, leading to significant improvements compared to\nother tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning,\nand +3.2% over LoRA). The codes would be released.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering Long-Term Effects on Parameter Efficient Fine-tuning\",\"authors\":\"Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang\",\"doi\":\"arxiv-2409.06706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern\\nrecognition capabilities and share extensive similarities with the human brain,\\nspecifically Biological Neural Networks (BNNs). We are particularly intrigued\\nby these models' ability to acquire new knowledge through fine-tuning. In this\\nregard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption\\nas a substitute for full fine-tuning due to its cost reduction in training and\\nmitigation of over-fitting risks by limiting the number of trainable parameters\\nduring adaptation. Since both ANNs and BNNs propagate information\\nlayer-by-layer, a common analogy can be drawn: weights in ANNs represent\\nsynapses in BNNs, while features (also known as latent variables or logits) in\\nANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT\\nmethods aim to adjust feature or parameter values using only a limited number\\nof trainable parameters (usually less than 1% of the total parameters), yet\\nachieve surprisingly good results. Building upon this clue, we delve deeper\\ninto exploring the connections between feature adjustment and parameter\\nadjustment, resulting in our proposed method Synapses & Neurons (SAN) that\\nlearns scaling matrices for features and propagates their effects towards\\nposterior weight matrices. Our approach draws strong inspiration from\\nwell-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term\\nDepression (LTD), which also reveal the relationship between synapse\\ndevelopment and neurotransmitter release levels. We conducted extensive\\ncomparisons of PEFT on 26 datasets using attention-based networks as well as\\nconvolution-based networks, leading to significant improvements compared to\\nother tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning,\\nand +3.2% over LoRA). The codes would be released.\",\"PeriodicalId\":501347,\"journal\":{\"name\":\"arXiv - CS - Neural and Evolutionary Computing\",\"volume\":\"45 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Neural and Evolutionary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

预训练的人工神经网络（ANN）具有强大的模式识别能力，与人脑，特别是生物神经网络（BNN）有着广泛的相似之处。我们对这些模型通过微调获取新知识的能力尤其感兴趣。在这方面，参数高效微调（Parameter-efficient Fine-tuning，PEFT）因其降低了训练成本，并通过在适应过程中限制可训练参数的数量来减轻过拟合风险，已被广泛采用，作为完全微调的替代方法。由于 ANNs 和 BNNs 都是逐层传播信息的，因此可以做一个共同的类比：ANNs 中的权重代表 BNNs 中的突触，而 ANNs 中的特征（也称为潜变量或对数）代表 BNNs 中神经元释放的神经递质。主流的 PEFT 方法旨在仅使用有限数量的可训练参数（通常少于总参数的 1%）来调整特征或参数值，但却取得了出人意料的好结果。在这一线索的基础上，我们深入探索了特征调整和参数调整之间的联系，从而提出了我们的方法 "突触与神经元"（SAN），它可以学习特征的缩放矩阵，并将其影响传播到后置权重矩阵。我们的方法从众所周知的神经科学现象--长期电位（LTP）和长期抑制（LTD）中汲取了灵感，这两种现象也揭示了突触发育与神经递质释放水平之间的关系。我们使用基于注意力的网络和基于卷积的网络在 26 个数据集上对 PEFT 进行了广泛的比较，结果发现，与其他调谐方法相比，PEFT 有了显著的改进（比完全调谐法提高了 8.5%，比视觉提示调谐法提高了 7%，比 LoRA 提高了 3.2%）。这些代码将被发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Discovering Long-Term Effects on Parameter Efficient Fine-tuning

Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern recognition capabilities and share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs). We are particularly intrigued by these models' ability to acquire new knowledge through fine-tuning. In this regard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption as a substitute for full fine-tuning due to its cost reduction in training and mitigation of over-fitting risks by limiting the number of trainable parameters during adaptation. Since both ANNs and BNNs propagate information layer-by-layer, a common analogy can be drawn: weights in ANNs represent synapses in BNNs, while features (also known as latent variables or logits) in ANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT methods aim to adjust feature or parameter values using only a limited number of trainable parameters (usually less than 1% of the total parameters), yet achieve surprisingly good results. Building upon this clue, we delve deeper into exploring the connections between feature adjustment and parameter adjustment, resulting in our proposed method Synapses & Neurons (SAN) that learns scaling matrices for features and propagates their effects towards posterior weight matrices. Our approach draws strong inspiration from well-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term Depression (LTD), which also reveal the relationship between synapse development and neurotransmitter release levels. We conducted extensive comparisons of PEFT on 26 datasets using attention-based networks as well as convolution-based networks, leading to significant improvements compared to other tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning, and +3.2% over LoRA). The codes would be released.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量