Self-adaptive asynchronous federated optimizer with adversarial sharpness-aware minimization

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-07-31 DOI:10.1016/j.future.2024.07.045

Xiongtao Zhang , Ji Wang , Weidong Bao , Wenhua Xiao , Yaohong Zhang , Lihua Liu

{"title":"Self-adaptive asynchronous federated optimizer with adversarial sharpness-aware minimization","authors":"Xiongtao Zhang , Ji Wang , Weidong Bao , Wenhua Xiao , Yaohong Zhang , Lihua Liu","doi":"10.1016/j.future.2024.07.045","DOIUrl":null,"url":null,"abstract":"<div><p>The past years have witnessed the success of a distributed learning system called Federated Learning (FL). Recently, asynchronous FL (AFL) has demonstrated its potential in concurrency compared to mainstream synchronous FL. However, the inherent systematic and statistical heterogeneity has presented several impediments to AFL: On the client side, the discrepancies in trips and local model drift impede global performance enhancement; On the server side, dynamic communication leads to significant fluctuations in gradient arrival time, while asynchronous arrival gradients with ambiguous value are not fully leveraged. In this paper, we propose an adaptive AFL framework, ARDAGH, which systematically addresses the aforementioned challenges: Firstly, to address the discrepancies in client trips, ARDAGH ensures their convergence by incorporating only 1-bit feedback information into the downlink. Secondly, to counter the drift of clients, ARDAGH generalizes the local models by employing our novel adversarial sharpness-aware minimization, which does not necessitate reliance on additional global variables. Thirdly, in the face of gradient latency issues, ARDAGH employs a communication-aware dropout strategy to adaptively compress gradients to ensure similar transmission times. Finally, to fully unleash the potential of each gradient, we establish a consistent optimal direction by conceptualizing the aggregation as an optimizer with successive momentum. In light of the comprehensive solution provided by ARDAGH, an algorithm named FedAMO is derived, and its superiority is confirmed by experimental results obtained under challenging prototype and simulation settings. Particularly in typical sentiment analysis tasks, FedAMO demonstrates an improvement of up to 5.351% with a 20.056-fold acceleration compared to conventional asynchronous methods.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"161 ","pages":"Pages 638-654"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004175","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The past years have witnessed the success of a distributed learning system called Federated Learning (FL). Recently, asynchronous FL (AFL) has demonstrated its potential in concurrency compared to mainstream synchronous FL. However, the inherent systematic and statistical heterogeneity has presented several impediments to AFL: On the client side, the discrepancies in trips and local model drift impede global performance enhancement; On the server side, dynamic communication leads to significant fluctuations in gradient arrival time, while asynchronous arrival gradients with ambiguous value are not fully leveraged. In this paper, we propose an adaptive AFL framework, ARDAGH, which systematically addresses the aforementioned challenges: Firstly, to address the discrepancies in client trips, ARDAGH ensures their convergence by incorporating only 1-bit feedback information into the downlink. Secondly, to counter the drift of clients, ARDAGH generalizes the local models by employing our novel adversarial sharpness-aware minimization, which does not necessitate reliance on additional global variables. Thirdly, in the face of gradient latency issues, ARDAGH employs a communication-aware dropout strategy to adaptively compress gradients to ensure similar transmission times. Finally, to fully unleash the potential of each gradient, we establish a consistent optimal direction by conceptualizing the aggregation as an optimizer with successive momentum. In light of the comprehensive solution provided by ARDAGH, an algorithm named FedAMO is derived, and its superiority is confirmed by experimental results obtained under challenging prototype and simulation settings. Particularly in typical sentiment analysis tasks, FedAMO demonstrates an improvement of up to 5.351% with a 20.056-fold acceleration compared to conventional asynchronous methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有对抗性锐度感知最小化功能的自适应异步联合优化器

过去几年中，一种名为 "联合学习"（FL）的分布式学习系统取得了成功。最近，与主流的同步学习系统相比，异步学习系统（AFL）在并发性方面显示出了潜力。然而，固有的系统和统计异质性给 AFL 带来了一些障碍：在客户端，行程差异和局部模型漂移阻碍了全局性能的提升；在服务器端，动态通信导致梯度到达时间大幅波动，而具有模糊值的异步到达梯度则无法充分利用。本文提出了一种自适应 AFL 框架 ARDAGH，系统地解决了上述难题：首先，针对客户端行程的差异，ARDAGH 通过在下行链路中仅加入 1 位反馈信息来确保其收敛。其次，为了应对客户端的漂移，ARDAGH 通过采用我们新颖的对抗性锐度感知最小化技术，对局部模型进行了扩展，从而无需依赖额外的全局变量。第三，面对梯度延迟问题，ARDAGH 采用了通信感知放弃策略，自适应地压缩梯度，以确保相似的传输时间。最后，为了充分发挥每个梯度的潜力，我们将聚合概念化为具有连续动力的优化器，从而建立了一致的优化方向。根据 ARDAGH 提供的综合解决方案，我们推导出了一种名为 FedAMO 的算法，并通过在具有挑战性的原型和模拟设置下获得的实验结果证实了该算法的优越性。特别是在典型的情感分析任务中，与传统的异步方法相比，FedAMO 的性能提高了 5.351%，加速了 20.056 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.