Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov
{"title":"Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions","authors":"Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov","doi":"arxiv-2409.10478","DOIUrl":null,"url":null,"abstract":"Distributed optimization plays an important role in modern large-scale\nmachine learning and data processing systems by optimizing the utilization of\ncomputational resources. One of the classical and popular approaches is Local\nStochastic Gradient Descent (Local SGD), characterized by multiple local\nupdates before averaging, which is particularly useful in distributed\nenvironments to reduce communication bottlenecks and improve scalability. A\ntypical feature of this method is the dependence on the frequency of\ncommunications. But in the case of a quadratic target function with homogeneous\ndata distribution over all devices, the influence of frequency of\ncommunications vanishes. As a natural consequence, subsequent studies include\nthe assumption of a Lipschitz Hessian, as this indicates the similarity of the\noptimized function to a quadratic one to some extent. However, in order to\nextend the completeness of the Local SGD theory and unlock its potential, in\nthis paper we abandon the Lipschitz Hessian assumption by introducing a new\nconcept of $\\textit{approximate quadraticity}$. This assumption gives a new\nperspective on problems that have near quadratic properties. In addition,\nexisting theoretical analyses of Local SGD often assume bounded variance. We,\nin turn, consider the unbounded noise condition, which allows us to broaden the\nclass of studied problems.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Distributed optimization plays an important role in modern large-scale
machine learning and data processing systems by optimizing the utilization of
computational resources. One of the classical and popular approaches is Local
Stochastic Gradient Descent (Local SGD), characterized by multiple local
updates before averaging, which is particularly useful in distributed
environments to reduce communication bottlenecks and improve scalability. A
typical feature of this method is the dependence on the frequency of
communications. But in the case of a quadratic target function with homogeneous
data distribution over all devices, the influence of frequency of
communications vanishes. As a natural consequence, subsequent studies include
the assumption of a Lipschitz Hessian, as this indicates the similarity of the
optimized function to a quadratic one to some extent. However, in order to
extend the completeness of the Local SGD theory and unlock its potential, in
this paper we abandon the Lipschitz Hessian assumption by introducing a new
concept of $\textit{approximate quadraticity}$. This assumption gives a new
perspective on problems that have near quadratic properties. In addition,
existing theoretical analyses of Local SGD often assume bounded variance. We,
in turn, consider the unbounded noise condition, which allows us to broaden the
class of studied problems.