Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast

IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Services Computing Pub Date : 2024-11-25 DOI:10.1109/TSC.2024.3506480
Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu
{"title":"Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast","authors":"Shouxi Luo;Pingzhi Fan;Ke Li;Huanlai Xing;Long Luo;Hongfang Yu","doi":"10.1109/TSC.2024.3506480","DOIUrl":null,"url":null,"abstract":"Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, <inline-formula><tex-math>$i)$</tex-math></inline-formula> the training still converges, even if only <inline-formula><tex-math>$p$</tex-math></inline-formula> workers take part in each round of synchronization, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> a larger <inline-formula><tex-math>$p$</tex-math></inline-formula> generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for <i>peer-to-peer</i> distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose <small>SelMcast</small>, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly <inline-formula><tex-math>$p$</tex-math></inline-formula> receivers for each worker’s multicast in a bandwidth-agnostic way, <small>SelMcast</small> chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average <inline-formula><tex-math>$p$</tex-math></inline-formula> values for faster convergence. Comprehensive evaluations show that <small>SelMcast</small> is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 1","pages":"156-168"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10767301/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, $i)$ the training still converges, even if only $p$ workers take part in each round of synchronization, and $ii)$ a larger $p$ generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SelMcast, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly $p$ receivers for each worker’s multicast in a bandwidth-agnostic way, SelMcast chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average $p$ values for faster convergence. Comprehensive evaluations show that SelMcast is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用选择性多播实现对等分布式学习的高效参数同步
分布式机器学习的最新进展从理论上和经验上表明,对于许多模型来说,只要工人最终参与同步,$i)训练仍然收敛,即使只有$p$工人参与每一轮同步,$ii)较大的$p$通常会导致更快的收敛速度。这些发现有助于消除大规模数据并行分布式训练中参数同步的瓶颈效应,并激发了一些优化设计。本文重点研究了点对点分布式学习的参数同步优化问题,在点对点分布式学习中,工作人员通过广播或组播来更新自己的参数以实现同步,并提出了一套高效的组播接收者选择算法SelMcast来实现这一目标。与最先进的(SOTA)设计相比,SelMcast以带宽不可知的方式随机为每个工人的组播选择精确的$p$接收器,SelMcast基于可用带宽和负载的全局视图选择接收器,具有两个优点,即加速参数同步以提高计算资源的利用率和扩大平均$p$值以加快收敛速度。综合评估表明,SelMcast在点对点批量同步并行(BSP)和陈旧同步并行(SSP)分布式训练中都是有效的,显著优于SOTA解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Services Computing
IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
11.50
自引率
6.20%
发文量
278
审稿时长
>12 weeks
期刊介绍: IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.
期刊最新文献
NL2Filter: A Robust CNN Design for Visual Services Over Cloud and Device DALAD: Unsupervised Detection of Global and Local Anomalies in Microservice Systems A Knee Point-Driven Set-Based Swarm Optimizer for Computing Tasks Allocation Oriented to Marginal Utility in Fog Computing Kafka-Thor: A Kafka-based In-Edge Data Streaming Platform for Enhanced V2X Services PeerSync: Accelerating Containerized Model Inference at the Network Edge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1