{"title":"分布随机梯度下降瞬态时间的一个尖锐估计","authors":"Shi Pu;Alex Olshevsky;Ioannis Ch. Paschalidis","doi":"10.1109/TAC.2021.3126253","DOIUrl":null,"url":null,"abstract":"This article is concerned with minimizing the average of \n<inline-formula><tex-math>$n$</tex-math></inline-formula>\n cost functions over a network, in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a nonasymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network-independent convergence rate compared to centralized stochastic gradient descent. Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a “hard” optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"67 11","pages":"5900-5915"},"PeriodicalIF":6.2000,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9609587","citationCount":"44","resultStr":"{\"title\":\"A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent\",\"authors\":\"Shi Pu;Alex Olshevsky;Ioannis Ch. Paschalidis\",\"doi\":\"10.1109/TAC.2021.3126253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article is concerned with minimizing the average of \\n<inline-formula><tex-math>$n$</tex-math></inline-formula>\\n cost functions over a network, in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a nonasymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network-independent convergence rate compared to centralized stochastic gradient descent. Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a “hard” optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.\",\"PeriodicalId\":13201,\"journal\":{\"name\":\"IEEE Transactions on Automatic Control\",\"volume\":\"67 11\",\"pages\":\"5900-5915\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2021-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9609587\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automatic Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9609587/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9609587/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent
This article is concerned with minimizing the average of
$n$
cost functions over a network, in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a nonasymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network-independent convergence rate compared to centralized stochastic gradient descent. Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a “hard” optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.
期刊介绍:
In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered:
1) Papers: Presentation of significant research, development, or application of control concepts.
2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions.
In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.