Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions

Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov
{"title":"Local SGD for Near-Quadratic Problems: Improving Convergence under Unconstrained Noise Conditions","authors":"Andrey Sadchikov, Savelii Chezhegov, Aleksandr Beznosikov, Alexander Gasnikov","doi":"arxiv-2409.10478","DOIUrl":null,"url":null,"abstract":"Distributed optimization plays an important role in modern large-scale\nmachine learning and data processing systems by optimizing the utilization of\ncomputational resources. One of the classical and popular approaches is Local\nStochastic Gradient Descent (Local SGD), characterized by multiple local\nupdates before averaging, which is particularly useful in distributed\nenvironments to reduce communication bottlenecks and improve scalability. A\ntypical feature of this method is the dependence on the frequency of\ncommunications. But in the case of a quadratic target function with homogeneous\ndata distribution over all devices, the influence of frequency of\ncommunications vanishes. As a natural consequence, subsequent studies include\nthe assumption of a Lipschitz Hessian, as this indicates the similarity of the\noptimized function to a quadratic one to some extent. However, in order to\nextend the completeness of the Local SGD theory and unlock its potential, in\nthis paper we abandon the Lipschitz Hessian assumption by introducing a new\nconcept of $\\textit{approximate quadraticity}$. This assumption gives a new\nperspective on problems that have near quadratic properties. In addition,\nexisting theoretical analyses of Local SGD often assume bounded variance. We,\nin turn, consider the unbounded noise condition, which allows us to broaden the\nclass of studied problems.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed optimization plays an important role in modern large-scale machine learning and data processing systems by optimizing the utilization of computational resources. One of the classical and popular approaches is Local Stochastic Gradient Descent (Local SGD), characterized by multiple local updates before averaging, which is particularly useful in distributed environments to reduce communication bottlenecks and improve scalability. A typical feature of this method is the dependence on the frequency of communications. But in the case of a quadratic target function with homogeneous data distribution over all devices, the influence of frequency of communications vanishes. As a natural consequence, subsequent studies include the assumption of a Lipschitz Hessian, as this indicates the similarity of the optimized function to a quadratic one to some extent. However, in order to extend the completeness of the Local SGD theory and unlock its potential, in this paper we abandon the Lipschitz Hessian assumption by introducing a new concept of $\textit{approximate quadraticity}$. This assumption gives a new perspective on problems that have near quadratic properties. In addition, existing theoretical analyses of Local SGD often assume bounded variance. We, in turn, consider the unbounded noise condition, which allows us to broaden the class of studied problems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
近二次元问题的局部 SGD:改善无约束噪声条件下的收敛性
分布式优化通过优化计算资源的利用,在现代大型机器学习和数据处理系统中发挥着重要作用。LocalStochastic Gradient Descent(Local SGD)是其中一种经典且流行的方法,其特点是在平均之前进行多次局部更新,在分布式环境中尤其适用,可以减少通信瓶颈,提高可扩展性。这种方法的一个非典型特征是依赖于通信频率。但在所有设备数据分布均匀的二次目标函数情况下,通信频率的影响就消失了。因此,随后的研究自然而然地纳入了 Lipschitz Hessian 假设,因为这在一定程度上表明了优化函数与二次函数的相似性。然而,为了扩展局部 SGD 理论的完整性并释放其潜力,本文通过引入 $\textit{approximate quadraticity}$ 的新概念,放弃了 Lipschitz Hessian 假设。这一假设为具有近似二次性质的问题提供了新的视角。此外,现有的局部 SGD 理论分析通常假设方差是有界的。而我们考虑的是无界噪声条件,这使我们能够扩大所研究问题的类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Trading with propagators and constraints: applications to optimal execution and battery storage Upgrading edges in the maximal covering location problem Minmax regret maximal covering location problems with edge demands Parametric Shape Optimization of Flagellated Micro-Swimmers Using Bayesian Techniques Rapid and finite-time boundary stabilization of a KdV system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1