数据异构下局部更新在分散学习中的有效性

IF 5.8 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Signal Processing Pub Date : 2025-01-24 DOI:10.1109/TSP.2025.3533208
Tongle Wu;Zhize Li;Ying Sun
{"title":"数据异构下局部更新在分散学习中的有效性","authors":"Tongle Wu;Zhize Li;Ying Sun","doi":"10.1109/TSP.2025.3533208","DOIUrl":null,"url":null,"abstract":"We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for <inline-formula><tex-math>$\\mu$</tex-math></inline-formula>-strongly convex and <inline-formula><tex-math>$L$</tex-math></inline-formula>-smooth loss functions, we proved that local DGT achieves communication complexity <inline-formula><tex-math>$\\tilde{\\mathcal{O}}\\Big{(}\\frac{L}{\\mu(K+1)}+\\frac{\\delta+{}{\\mu}}{\\mu(1-\\rho)}+\\frac{\\rho}{(1-\\rho)^{2}}\\cdot\\frac{L+\\delta}{\\mu}\\Big{)}$</tex-math></inline-formula>, where <inline-formula><tex-math>$K$</tex-math></inline-formula> is the number of additional local update, <inline-formula><tex-math>$\\rho$</tex-math></inline-formula> measures the network connectivity and <inline-formula><tex-math>$\\delta$</tex-math></inline-formula> measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing <inline-formula><tex-math>$K$</tex-math></inline-formula> can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. Customization of the result to linear models is further provided, with improved rate expression. Numerical experiments validate our theoretical results.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"751-765"},"PeriodicalIF":5.8000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Effectiveness of Local Updates for Decentralized Learning Under Data Heterogeneity\",\"authors\":\"Tongle Wu;Zhize Li;Ying Sun\",\"doi\":\"10.1109/TSP.2025.3533208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for <inline-formula><tex-math>$\\\\mu$</tex-math></inline-formula>-strongly convex and <inline-formula><tex-math>$L$</tex-math></inline-formula>-smooth loss functions, we proved that local DGT achieves communication complexity <inline-formula><tex-math>$\\\\tilde{\\\\mathcal{O}}\\\\Big{(}\\\\frac{L}{\\\\mu(K+1)}+\\\\frac{\\\\delta+{}{\\\\mu}}{\\\\mu(1-\\\\rho)}+\\\\frac{\\\\rho}{(1-\\\\rho)^{2}}\\\\cdot\\\\frac{L+\\\\delta}{\\\\mu}\\\\Big{)}$</tex-math></inline-formula>, where <inline-formula><tex-math>$K$</tex-math></inline-formula> is the number of additional local update, <inline-formula><tex-math>$\\\\rho$</tex-math></inline-formula> measures the network connectivity and <inline-formula><tex-math>$\\\\delta$</tex-math></inline-formula> measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing <inline-formula><tex-math>$K$</tex-math></inline-formula> can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. Customization of the result to linear models is further provided, with improved rate expression. Numerical experiments validate our theoretical results.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"73 \",\"pages\":\"751-765\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10852183/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10852183/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

我们回顾了两种基本的分散优化方法,分散梯度跟踪(DGT)和分散梯度下降(DGD),具有多个局部更新。我们考虑了两种设置,并证明结合本地更新步骤可以降低通信复杂性。具体来说,对于$\mu$ -强凸和$L$ -平滑损失函数,我们证明了局部DGT达到了通信复杂度$\tilde{\mathcal{O}}\Big{(}\frac{L}{\mu(K+1)}+\frac{\delta+{}{\mu}}{\mu(1-\rho)}+\frac{\rho}{(1-\rho)^{2}}\cdot\frac{L+\delta}{\mu}\Big{)}$,其中$K$为额外的局部更新次数,$\rho$衡量网络连通性,$\delta$衡量局部损失的二阶异质性。我们的研究结果揭示了通信和计算之间的权衡,并表明在数据异构性较低和网络连接良好的情况下,增加$K$可以有效地降低通信成本。然后,我们考虑了局部损失具有相同最小值的过参数化情况。我们证明了在Polyak-Łojasiewicz (PL)条件下,在DGD中使用局部更新,即使没有梯度校正,也可以实现精确的线性收敛,在降低通信复杂性方面可以产生与DGT相似的效果。进一步提供了将结果定制为线性模型的功能,并改进了速率表达式。数值实验验证了理论结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Effectiveness of Local Updates for Decentralized Learning Under Data Heterogeneity
We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for $\mu$-strongly convex and $L$-smooth loss functions, we proved that local DGT achieves communication complexity $\tilde{\mathcal{O}}\Big{(}\frac{L}{\mu(K+1)}+\frac{\delta+{}{\mu}}{\mu(1-\rho)}+\frac{\rho}{(1-\rho)^{2}}\cdot\frac{L+\delta}{\mu}\Big{)}$, where $K$ is the number of additional local update, $\rho$ measures the network connectivity and $\delta$ measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing $K$ can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. Customization of the result to linear models is further provided, with improved rate expression. Numerical experiments validate our theoretical results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing 工程技术-工程:电子与电气
CiteScore
11.20
自引率
9.30%
发文量
310
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.
期刊最新文献
Subspace Clustering of Subspaces: Unifying Canonical Correlation Analysis and Subspace Clustering Byzantine-Resilient Decentralized Optimization for Joint Feature Selection in Multi-Task Networks Neural Collapse based Deep Supervised Federated Learning for Signal Detection in OFDM Systems Directed Acyclic Graph Convolutional Networks Filtering Markov Jump Systems with Partially Known Dynamics: A Model-Based Deep Learning Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1