Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

arXiv - MATH - Optimization and Control Pub Date : 2024-09-16 DOI:arxiv-2409.10478

Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov

{"title":"Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions","authors":"Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov","doi":"arxiv-2409.10478","DOIUrl":null,"url":null,"abstract":"Distributed optimization plays an important role in modern large-scale\nmachine learning and data processing systems by optimizing the utilization of\ncomputational resources. One of the classical and popular approaches is Local\nStochastic Gradient Descent (Local SGD), characterized by multiple local\nupdates before averaging, which is particularly useful in distributed\nenvironments to reduce communication bottlenecks and improve scalability. A\ntypical feature of this method is the dependence on the frequency of\ncommunications. But in the case of a quadratic target function with homogeneous\ndata distribution over all devices, the influence of frequency of\ncommunications vanishes. As a natural consequence, subsequent studies include\nthe assumption of a Lipschitz Hessian, as this indicates the similarity of the\noptimized function to a quadratic one to some extent. However, in order to\nextend the completeness of the Local SGD theory and unlock its potential, in\nthis paper we abandon the Lipschitz Hessian assumption by introducing a new\nconcept of $\\textit{approximate quadraticity}$. This assumption gives a new\nperspective on problems that have near quadratic properties. In addition,\nexisting theoretical analyses of Local SGD often assume bounded variance. We,\nin turn, consider the unbounded noise condition, which allows us to broaden the\nclass of studied problems.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communication bottlenecks and improve scalability. A typical feature of this method is the dependence on the frequency of communications. But in the case of a quadratic target function with homogeneous data distribution over all devices, the influence of frequency of communications vanishes. As a natural consequence, subsequent studies include the assumption of a Lipschitz Hessian, as this indicates the similarity of the optimized function to a quadratic one to some extent. However, in order to extend the completeness of the Local SGD theory and unlock its potential, in this paper we abandon the Lipschitz Hessian assumption by introducing a new concept of $\textit{approximate quadraticity}$. This assumption gives a new perspective on problems that have near quadratic properties. In addition, existing theoretical analyses of Local SGD often assume bounded variance. We, in turn, consider the unbounded noise condition, which allows us to broaden the class of studied problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

近二次元问题的局部 SGD：改善无约束噪声条件下的收敛性

分布式优化通过优化计算资源的利用，在现代大型机器学习和数据处理系统中发挥着重要作用。LocalStochastic Gradient Descent（Local SGD）是其中一种经典且流行的方法，其特点是在平均之前进行多次局部更新，在分布式环境中尤其适用，可以减少通信瓶颈，提高可扩展性。这种方法的一个非典型特征是依赖于通信频率。但在所有设备数据分布均匀的二次目标函数情况下，通信频率的影响就消失了。因此，随后的研究自然而然地纳入了 Lipschitz Hessian 假设，因为这在一定程度上表明了优化函数与二次函数的相似性。然而，为了扩展局部 SGD 理论的完整性并释放其潜力，本文通过引入 $\textit{approximate quadraticity}$ 的新概念，放弃了 Lipschitz Hessian 假设。这一假设为具有近似二次性质的问题提供了新的视角。此外，现有的局部 SGD 理论分析通常假设方差是有界的。而我们考虑的是无界噪声条件，这使我们能够扩大所研究问题的类别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Optimization and Control

自引率

0.00%

发文量

期刊最新文献

Trading with propagators and constraints: applications to optimal execution and battery storage Upgrading edges in the maximal covering location problem Minmax regret maximal covering location problems with edge demands Parametric Shape Optimization of Flagellated Micro-Swimmers Using Bayesian Techniques Rapid and finite-time boundary stabilization of a KdV system