Accelerated Distributed Stochastic Nonconvex Optimization Over Time-Varying Directed Networks

IF 7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automatic Control Pub Date : 2024-10-14 DOI:10.1109/TAC.2024.3479888

Yiyue Chen;Abolfazl Hashemi;Haris Vikalo

{"title":"Accelerated Distributed Stochastic Nonconvex Optimization Over Time-Varying Directed Networks","authors":"Yiyue Chen;Abolfazl Hashemi;Haris Vikalo","doi":"10.1109/TAC.2024.3479888","DOIUrl":null,"url":null,"abstract":"Distributed stochastic nonconvex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed nonconvex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems that communicate gradient acceleration components. We prove that the algorithm's oracle complexity is <inline-formula><tex-math>$\\mathcal {O}(1/\\epsilon ^{1.5})$</tex-math></inline-formula>, and that under Polyak-<inline-formula><tex-math>$\\text{L}$</tex-math></inline-formula>ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a nonconvex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 4","pages":"2196-2211"},"PeriodicalIF":7.0000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10715643/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Distributed stochastic nonconvex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed nonconvex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems that communicate gradient acceleration components. We prove that the algorithm's oracle complexity is

$\mathcal {O}(1/\epsilon ^{1.5})$

, and that under Polyak-

$\text{L}$

ojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a nonconvex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

时变定向网络上的加速分布式随机非凸优化

由于信号处理、计算机视觉和自然语言处理社区对部署在分布式学习系统（例如联邦学习）上的应用程序越来越感兴趣，分布式随机非凸优化问题最近受到了关注。我们研究了数据分布在时变有向网络节点上的设置，这是一种适合建模经历通信延迟和离散效应的动态网络的拓扑结构。网络节点只能访问其局部目标，并查询随机一阶oracle以获得梯度估计，通过与邻居交换消息来协作最小化全局目标函数。我们提出了一种新颖的算法，该算法利用具有动量和梯度跟踪的随机梯度下降来解决时变网络上的分布式非凸优化问题。为了分析该算法，我们解决了在分析具有梯度加速度分量的动态网络系统时出现的挑战。证明了算法的复杂度为$\mathcal {O}(1/\epsilon ^{1.5})$，并证明了算法在Polyak-$\text{L}$ojasiewicz条件下线性收敛到稳态误差状态。该方案在多个学习任务上进行了测试：在MNIST数据集上进行非凸逻辑回归实验，在CIFAR-10数据集上进行图像分类任务，在IMDB数据集上进行NLP分类测试。我们进一步给出了满足PL条件的目标的数值模拟。结果表明，与现有的相关方法相比，该框架具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automatic Control 工程技术-工程：电子与电气

CiteScore

11.30

自引率

5.90%

发文量

824

审稿时长

9 months

期刊介绍： In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered: 1) Papers: Presentation of significant research, development, or application of control concepts. 2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions. In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.

期刊最新文献

IEEE Control Systems Society Information Differentially private consensus of multi-agent systems under binary-valued communications The Internal Model Principle of Time-Varying Optimization Leader-following Consensus over Jointly Connected Switching Networks is Achievable for Exponentially Unstable Linear Systems Data-Driven Control of T-Product-Based Dynamical Systems