FedFMSL: Federated Learning of Foundation Models With Sparsely Activated LoRA

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Mobile Computing Pub Date : 2024-09-04 DOI:10.1109/TMC.2024.3454634

Panlong Wu;Kangshuo Li;Ting Wang;Yanjie Dong;Victor C. M. Leung;Fangxin Wang

{"title":"FedFMSL: Federated Learning of Foundation Models With Sparsely Activated LoRA","authors":"Panlong Wu;Kangshuo Li;Ting Wang;Yanjie Dong;Victor C. M. Leung;Fangxin Wang","doi":"10.1109/TMC.2024.3454634","DOIUrl":null,"url":null,"abstract":"Foundation models (FMs) have shown great success in natural language processing, computer vision, and multimodal tasks. FMs have a large number of model parameters, thus requiring a substantial amount of data to help optimize the model during the training. Federated learning has revolutionized machine learning by enabling collaborative learning from decentralized data while still preserving clients’ data privacy. Despite the great benefits foundation models can have empowered by federated learning, their bulky model parameters cause severe communication challenges for modern networks and computation challenges especially for edge devices. Moreover, the data distribution of different clients can be different thus inducing statistical challenges. In this paper, we propose a novel two-stage federated learning algorithm called FedFMSL. A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization. We construct a Mixture of Foundation Models (\n<monospace>MoFM</monospace>\n) with these two experts and design a gate neural network with an inserted gate adapter that joins the aggregation every communication round in the second stage. To further adapt to edge computing scenarios with limited computational resources, we design a novel Sparsely Activated LoRA (\n<monospace>SAL</monospace>\n) algorithm that freezes the pre-trained foundation model parameters inserts low-rank adaptation matrices into transformer blocks, and activates them progressively during the training. We employ extensive experiments to verify the effectiveness of FedFMSL, results show that FedFMSL outperforms other SOTA baselines by up to 59.19% in default settings while tuning less than 0.3% parameters of the foundation model.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"23 12","pages":"15167-15181"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666083/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Foundation models (FMs) have shown great success in natural language processing, computer vision, and multimodal tasks. FMs have a large number of model parameters, thus requiring a substantial amount of data to help optimize the model during the training. Federated learning has revolutionized machine learning by enabling collaborative learning from decentralized data while still preserving clients’ data privacy. Despite the great benefits foundation models can have empowered by federated learning, their bulky model parameters cause severe communication challenges for modern networks and computation challenges especially for edge devices. Moreover, the data distribution of different clients can be different thus inducing statistical challenges. In this paper, we propose a novel two-stage federated learning algorithm called FedFMSL. A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization. We construct a Mixture of Foundation Models ( MoFM ) with these two experts and design a gate neural network with an inserted gate adapter that joins the aggregation every communication round in the second stage. To further adapt to edge computing scenarios with limited computational resources, we design a novel Sparsely Activated LoRA ( SAL ) algorithm that freezes the pre-trained foundation model parameters inserts low-rank adaptation matrices into transformer blocks, and activates them progressively during the training. We employ extensive experiments to verify the effectiveness of FedFMSL, results show that FedFMSL outperforms other SOTA baselines by up to 59.19% in default settings while tuning less than 0.3% parameters of the foundation model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FedFMSL：利用稀疏激活的 LoRA 联合学习基础模型

基础模型（FM）在自然语言处理、计算机视觉和多模态任务中取得了巨大成功。FM 有大量的模型参数，因此需要大量数据来帮助在训练过程中优化模型。联盟学习使机器学习发生了革命性的变化，它可以从分散的数据中进行协作学习，同时还能保护客户的数据隐私。尽管联合学习能为基础模型带来巨大好处，但其庞大的模型参数会给现代网络带来严峻的通信挑战，尤其是给边缘设备的计算带来挑战。此外，不同客户的数据分布也可能不同，从而带来统计方面的挑战。在本文中，我们提出了一种名为 FedFMSL 的新型两阶段联合学习算法。第一阶段训练全局专家，第二阶段训练本地专家，以提供更好的个性化服务。我们用这两位专家构建了一个基础模型混合物（MoFM），并设计了一个带有插入式门适配器的门神经网络，该适配器在第二阶段的每一轮通信中都会加入聚合。为了进一步适应计算资源有限的边缘计算场景，我们设计了一种新颖的稀疏激活 LoRA（SAL）算法，它可以冻结预先训练好的基础模型参数，将低秩适应矩阵插入变压器块，并在训练过程中逐步激活它们。我们通过大量实验来验证 FedFMSL 的有效性，结果表明，在默认设置下，FedFMSL 的性能比其他 SOTA 基线高出 59.19%，而基础模型参数的调整率不到 0.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.