A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2024-04-17 DOI:10.1016/j.artint.2024.104130
Haizhou Du, Chaoqian Cheng, Chengdong Ni
{"title":"A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data","authors":"Haizhou Du,&nbsp;Chaoqian Cheng,&nbsp;Chengdong Ni","doi":"10.1016/j.artint.2024.104130","DOIUrl":null,"url":null,"abstract":"<div><p>Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms <span>D-SUM</span> and <span>GT-DSUM</span> based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (<em>i.e.</em>, distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"332 ","pages":"Article 104130"},"PeriodicalIF":5.1000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000663","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms D-SUM and GT-DSUM based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对非凸模型和异构数据的基于动量的分散式 SGD 统一范式
最近,新兴的分布式应用推动了分散式机器学习的发展,尤其是在物联网和边缘计算领域。在现实世界的应用场景中,非凸性和数据异构性等常见问题导致效率低下、性能下降和发展停滞。大部分研究都集中在上述问题中的一个,而没有一个被证明是最佳的通用框架。为此,我们提出了一种名为 UMP 的统一范式,其中包括两种算法 D-SUM 和 GT-DSUM,这两种算法基于分散随机梯度下降(SGD)的动量技术。前者为一般非凸目标提供收敛保证,后者则通过引入梯度跟踪进行扩展,它能估计全局优化方向,以减轻数据异质性(即分布漂移)。在 UMP 中,我们可以涵盖大多数基于经典重球或内斯特洛夫加速度的动量变体,并具有不同的参数。在理论上,我们严格提供了这两种方法对于非凸目标的收敛性分析,并进行了大量实验,结果表明,与其他方法相比,这两种方法在实践中显著提高了模型精度,最高可达 57.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
期刊最新文献
Lifted action models learning from partial traces Human-AI coevolution Editorial Board Separate but equal: Equality in belief propagation for single-cycle graphs Generative models for grid-based and image-based pathfinding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1