{"title":"STSyn:利用容错同步加速本地 SGD","authors":"Feng Zhu;Jingjing Zhang;Xin Wang","doi":"10.1109/TSP.2024.3452035","DOIUrl":null,"url":null,"abstract":"Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the \n<inline-formula><tex-math>$K$</tex-math></inline-formula>\n fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"4050-4064"},"PeriodicalIF":4.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STSyn: Speeding Up Local SGD With Straggler-Tolerant Synchronization\",\"authors\":\"Feng Zhu;Jingjing Zhang;Xin Wang\",\"doi\":\"10.1109/TSP.2024.3452035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the \\n<inline-formula><tex-math>$K$</tex-math></inline-formula>\\n fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"72 \",\"pages\":\"4050-4064\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10659740/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659740/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
STSyn: Speeding Up Local SGD With Straggler-Tolerant Synchronization
Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the
$K$
fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.