Staleness aware semi-asynchronous federated learning

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-07-01 DOI:10.1016/j.jpdc.2024.104950

Miri Yu, Jiheon Choi, Jaehyun Lee, Sangyoon Oh

{"title":"Staleness aware semi-asynchronous federated learning","authors":"Miri Yu, Jiheon Choi, Jaehyun Lee, Sangyoon Oh","doi":"10.1016/j.jpdc.2024.104950","DOIUrl":null,"url":null,"abstract":"<div><p>As the attempts to distribute deep learning using personal data have increased, the importance of federated learning (FL) has also increased. Attempts have been made to overcome the core challenges of federated learning (i.e., statistical and system heterogeneity) using synchronous or asynchronous protocols. However, stragglers reduce training efficiency in terms of latency and accuracy in each protocol, respectively. To solve straggler issues, a semi-asynchronous protocol that combines the two protocols can be applied to FL; however, effectively handling the staleness of the local model is a difficult problem. We proposed SASAFL to solve the training inefficiency caused by staleness in semi-asynchronous FL. SASAFL enables stable training by considering the quality of the global model to synchronise the servers and clients. In addition, it achieves high accuracy and low latency by adjusting the number of participating clients in response to changes in global loss and immediately processing clients that did not to participate in the previous round. An evaluation was conducted under various conditions to verify the effectiveness of the SASAFL. SASAFL achieved 19.69%p higher accuracy than the baseline, 2.32 times higher round-to-accuracy and 2.24 times higher latency-to-accuracy. Additionally, SASAFL always achieved target accuracy that the baseline can't reach.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104950"},"PeriodicalIF":4.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S074373152400114X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

As the attempts to distribute deep learning using personal data have increased, the importance of federated learning (FL) has also increased. Attempts have been made to overcome the core challenges of federated learning (i.e., statistical and system heterogeneity) using synchronous or asynchronous protocols. However, stragglers reduce training efficiency in terms of latency and accuracy in each protocol, respectively. To solve straggler issues, a semi-asynchronous protocol that combines the two protocols can be applied to FL; however, effectively handling the staleness of the local model is a difficult problem. We proposed SASAFL to solve the training inefficiency caused by staleness in semi-asynchronous FL. SASAFL enables stable training by considering the quality of the global model to synchronise the servers and clients. In addition, it achieves high accuracy and low latency by adjusting the number of participating clients in response to changes in global loss and immediately processing clients that did not to participate in the previous round. An evaluation was conducted under various conditions to verify the effectiveness of the SASAFL. SASAFL achieved 19.69%p higher accuracy than the baseline, 2.32 times higher round-to-accuracy and 2.24 times higher latency-to-accuracy. Additionally, SASAFL always achieved target accuracy that the baseline can't reach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

滞后感知半同步联合学习

随着利用个人数据进行分布式深度学习的尝试越来越多，联合学习（FL）的重要性也随之增加。人们尝试使用同步或异步协议来克服联合学习的核心挑战（即统计和系统异构性）。然而，在每种协议中，杂波都会分别在延迟和准确性方面降低训练效率。为了解决杂散问题，可将两种协议结合的半同步协议应用于 FL；然而，有效处理本地模型的滞后性是一个难题。我们提出了 SASAFL，以解决半同步 FL 中因僵化而导致的训练效率低下问题。SASAFL 通过考虑全局模型的质量来同步服务器和客户端，从而实现稳定的训练。此外，它还能根据全局损失的变化调整参与的客户端数量，并立即处理上一轮未参与的客户端，从而实现高精度和低延迟。为了验证 SASAFL 的有效性，我们在各种条件下进行了评估。与基线相比，SASAFL 的准确率提高了 19.69%p，回合准确率提高了 2.32 倍，延迟准确率提高了 2.24 倍。此外，SASAFL 总是能达到基线无法达到的目标精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.