Accelerated Distributed Stochastic Nonconvex Optimization Over Time-Varying Directed Networks

IF 7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automatic Control Pub Date : 2024-10-14 DOI:10.1109/TAC.2024.3479888
Yiyue Chen;Abolfazl Hashemi;Haris Vikalo
{"title":"Accelerated Distributed Stochastic Nonconvex Optimization Over Time-Varying Directed Networks","authors":"Yiyue Chen;Abolfazl Hashemi;Haris Vikalo","doi":"10.1109/TAC.2024.3479888","DOIUrl":null,"url":null,"abstract":"Distributed stochastic nonconvex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed nonconvex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems that communicate gradient acceleration components. We prove that the algorithm's oracle complexity is <inline-formula><tex-math>$\\mathcal {O}(1/\\epsilon ^{1.5})$</tex-math></inline-formula>, and that under Polyak-<inline-formula><tex-math>$\\text{L}$</tex-math></inline-formula>ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a nonconvex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 4","pages":"2196-2211"},"PeriodicalIF":7.0000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10715643/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed stochastic nonconvex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed nonconvex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems that communicate gradient acceleration components. We prove that the algorithm's oracle complexity is $\mathcal {O}(1/\epsilon ^{1.5})$, and that under Polyak-$\text{L}$ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a nonconvex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
时变定向网络上的加速分布式随机非凸优化
由于信号处理、计算机视觉和自然语言处理社区对部署在分布式学习系统(例如联邦学习)上的应用程序越来越感兴趣,分布式随机非凸优化问题最近受到了关注。我们研究了数据分布在时变有向网络节点上的设置,这是一种适合建模经历通信延迟和离散效应的动态网络的拓扑结构。网络节点只能访问其局部目标,并查询随机一阶oracle以获得梯度估计,通过与邻居交换消息来协作最小化全局目标函数。我们提出了一种新颖的算法,该算法利用具有动量和梯度跟踪的随机梯度下降来解决时变网络上的分布式非凸优化问题。为了分析该算法,我们解决了在分析具有梯度加速度分量的动态网络系统时出现的挑战。证明了算法的复杂度为$\mathcal {O}(1/\epsilon ^{1.5})$,并证明了算法在Polyak-$\text{L}$ojasiewicz条件下线性收敛到稳态误差状态。该方案在多个学习任务上进行了测试:在MNIST数据集上进行非凸逻辑回归实验,在CIFAR-10数据集上进行图像分类任务,在IMDB数据集上进行NLP分类测试。我们进一步给出了满足PL条件的目标的数值模拟。结果表明,与现有的相关方法相比,该框架具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automatic Control
IEEE Transactions on Automatic Control 工程技术-工程:电子与电气
CiteScore
11.30
自引率
5.90%
发文量
824
审稿时长
9 months
期刊介绍: In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered: 1) Papers: Presentation of significant research, development, or application of control concepts. 2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions. In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.
期刊最新文献
Differentially private consensus of multi-agent systems under binary-valued communications The Internal Model Principle of Time-Varying Optimization Leader-following Consensus over Jointly Connected Switching Networks is Achievable for Exponentially Unstable Linear Systems Data-Driven Control of T-Product-Based Dynamical Systems Optimal Policy Design for Repeated Decision-Making under Social Influence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1