A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2024-04-17 DOI:10.1016/j.artint.2024.104130

Haizhou Du, Chaoqian Cheng, Chengdong Ni

{"title":"A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data","authors":"Haizhou Du, Chaoqian Cheng, Chengdong Ni","doi":"10.1016/j.artint.2024.104130","DOIUrl":null,"url":null,"abstract":"<div><p>Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms <span>D-SUM</span> and <span>GT-DSUM</span> based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (<em>i.e.</em>, distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"332 ","pages":"Article 104130"},"PeriodicalIF":5.1000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000663","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms D-SUM and GT-DSUM based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对非凸模型和异构数据的基于动量的分散式 SGD 统一范式

最近，新兴的分布式应用推动了分散式机器学习的发展，尤其是在物联网和边缘计算领域。在现实世界的应用场景中，非凸性和数据异构性等常见问题导致效率低下、性能下降和发展停滞。大部分研究都集中在上述问题中的一个，而没有一个被证明是最佳的通用框架。为此，我们提出了一种名为 UMP 的统一范式，其中包括两种算法 D-SUM 和 GT-DSUM，这两种算法基于分散随机梯度下降（SGD）的动量技术。前者为一般非凸目标提供收敛保证，后者则通过引入梯度跟踪进行扩展，它能估计全局优化方向，以减轻数据异质性（即分布漂移）。在 UMP 中，我们可以涵盖大多数基于经典重球或内斯特洛夫加速度的动量变体，并具有不同的参数。在理论上，我们严格提供了这两种方法对于非凸目标的收敛性分析，并进行了大量实验，结果表明，与其他方法相比，这两种方法在实践中显著提高了模型精度，最高可达 57.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.

期刊最新文献

Editorial Board Editorial Board A simple yet effective self-debiasing framework for transformer models A Kripke-Lewis semantics for belief update and belief revision EMOA*: A framework for search-based multi-objective path planning