{"title":"KAFL: Achieving High Training Efficiency for Fast-K Asynchronous Federated Learning","authors":"Xueyu Wu, Cho-Li Wang","doi":"10.1109/ICDCS54860.2022.00089","DOIUrl":null,"url":null,"abstract":"Federated Averaging (FedAvg) and its variants are prevalent optimization algorithms adopted in Federated Learning (FL) as they show good model convergence. However, such optimization methods are mostly running in a synchronous flavor which is plagued by the straggler problem, especially in the real-world FL scenario. Federated learning involves a massive number of resource-weak edge devices connected to the intermittent networks, exhibiting a vastly heterogeneous training environment. The asynchronous setting is a plausible solution to fulfill the resources utilization. Yet, due to data and device heterogeneity, the training bias and model staleness dramatically downgrade the model performance. This paper presents KAFL, a fast-K Asynchronous Federated Learning framework, to improve the system and statistical efficiency. KAFL allows the global server to iteratively collect and aggregate (1) the parameters uploaded by the fastest K edge clients (K-FedAsync); or (2) the first M updated parameters sent from any clients (Mstep-FedAsync). Compared to the fully asynchronous setting, KAFL helps the server obtain a better direction toward the global optima as it collects the information from at least K clients or M parameters. To further improve the convergence speed of KAFL, we propose a new weighted aggregation method which dynamically adjusts the aggregation weights according to the weight deviation matrix and client contribution frequency. Experimental results show that KAFL achieves a significant time-to-target-accuracy speedup on both IID and Non-IID datasets. To achieve the same model accuracy, KAFL reduces more than 50% training time for five CNN and RNN models, demonstrating the high training efficiency of our proposed framework.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Federated Averaging (FedAvg) and its variants are prevalent optimization algorithms adopted in Federated Learning (FL) as they show good model convergence. However, such optimization methods are mostly running in a synchronous flavor which is plagued by the straggler problem, especially in the real-world FL scenario. Federated learning involves a massive number of resource-weak edge devices connected to the intermittent networks, exhibiting a vastly heterogeneous training environment. The asynchronous setting is a plausible solution to fulfill the resources utilization. Yet, due to data and device heterogeneity, the training bias and model staleness dramatically downgrade the model performance. This paper presents KAFL, a fast-K Asynchronous Federated Learning framework, to improve the system and statistical efficiency. KAFL allows the global server to iteratively collect and aggregate (1) the parameters uploaded by the fastest K edge clients (K-FedAsync); or (2) the first M updated parameters sent from any clients (Mstep-FedAsync). Compared to the fully asynchronous setting, KAFL helps the server obtain a better direction toward the global optima as it collects the information from at least K clients or M parameters. To further improve the convergence speed of KAFL, we propose a new weighted aggregation method which dynamically adjusts the aggregation weights according to the weight deviation matrix and client contribution frequency. Experimental results show that KAFL achieves a significant time-to-target-accuracy speedup on both IID and Non-IID datasets. To achieve the same model accuracy, KAFL reduces more than 50% training time for five CNN and RNN models, demonstrating the high training efficiency of our proposed framework.