FreezePipe: An Efficient Dynamic Pipeline Parallel Approach Based on Freezing Mechanism for Distributed DNN Training

IF 2 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Supported Cooperative Work-The Journal of Collaborative Computing Pub Date : 2023-05-24 DOI:10.1109/CSCWD57460.2023.10152643
Caishan Weng, Zhiyang Shu, Zhengjia Xu, Jinghui Zhang, Junzhou Luo, Fang Dong, Peng Wang, Zhengang Wang
{"title":"FreezePipe: An Efficient Dynamic Pipeline Parallel Approach Based on Freezing Mechanism for Distributed DNN Training","authors":"Caishan Weng, Zhiyang Shu, Zhengjia Xu, Jinghui Zhang, Junzhou Luo, Fang Dong, Peng Wang, Zhengang Wang","doi":"10.1109/CSCWD57460.2023.10152643","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) training on a large scale is extremely time-consuming and computationally intensive, which is accelerated by distributed training. In recent years, pipeline parallelism has been developed, which enables partitioning the model across several devices, e.g. GPU, and training efficiency is improved by dividing data batches into micro-batches, with each of them processed by a different stage of the model. Currently, parallel training assumes pipeline placement and partitioning are static, with parameters updating each iteration, without accounting for freezing. This results in computational resources not being fully utilized. In this paper, we propose FreezePipe, a novel method for optimizing deep learning training that combines the freezing mechanism with pipeline parallel training. In FreezePipe, a lightweight method for determining the freezing strategy based on gradient changes is employed. Considering that resources need to be released based on the frozen layer, a lightweight model partitioning algorithm was designed to determine the optimal strategy for pipeline partitioning. Experimental results show that FreezePipe can reduce the training time by 64.5% compared to Torchgpipe on CIFAR-10 dataset without compromising any model performance.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"10 1","pages":"303-308"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152643","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep Neural Network (DNN) training on a large scale is extremely time-consuming and computationally intensive, which is accelerated by distributed training. In recent years, pipeline parallelism has been developed, which enables partitioning the model across several devices, e.g. GPU, and training efficiency is improved by dividing data batches into micro-batches, with each of them processed by a different stage of the model. Currently, parallel training assumes pipeline placement and partitioning are static, with parameters updating each iteration, without accounting for freezing. This results in computational resources not being fully utilized. In this paper, we propose FreezePipe, a novel method for optimizing deep learning training that combines the freezing mechanism with pipeline parallel training. In FreezePipe, a lightweight method for determining the freezing strategy based on gradient changes is employed. Considering that resources need to be released based on the frozen layer, a lightweight model partitioning algorithm was designed to determine the optimal strategy for pipeline partitioning. Experimental results show that FreezePipe can reduce the training time by 64.5% compared to Torchgpipe on CIFAR-10 dataset without compromising any model performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FreezePipe:一种基于冻结机制的高效动态管道并行分布式DNN训练方法
深度神经网络(Deep Neural Network, DNN)的大规模训练非常耗时和计算量大,分布式训练可以加快训练速度。近年来,流水线并行性得到了发展,它可以将模型划分到多个设备上,例如GPU,并且通过将数据批次划分为微批次来提高训练效率,每个批次由模型的不同阶段处理。目前,并行训练假设管道的放置和划分是静态的,每次迭代都更新参数,而不考虑冻结。这将导致计算资源没有得到充分利用。在本文中,我们提出了一种将冻结机制与管道并行训练相结合的优化深度学习训练的新方法FreezePipe。在FreezePipe中,采用了一种基于梯度变化确定冻结策略的轻量级方法。考虑到资源需要基于冻结层进行释放,设计了一种轻量级模型分区算法,确定了管道分区的最优策略。实验结果表明,在不影响模型性能的情况下,FreezePipe在CIFAR-10数据集上的训练时间比Torchgpipe减少了64.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Supported Cooperative Work-The Journal of Collaborative Computing
Computer Supported Cooperative Work-The Journal of Collaborative Computing COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
6.40
自引率
4.20%
发文量
31
审稿时长
>12 weeks
期刊介绍: Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW. The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas. The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.
期刊最新文献
Text-based Patient – Doctor Discourse Online And Patients’ Experiences of Empathy Agency, Power and Confrontation: the Role for Socially Engaged Art in CSCW with Rurban Communities in Support of Inclusion Data as Relation: Ontological Trouble in the Data-Driven Public Administration The Avatar Facial Expression Reenactment Method in the Metaverse based on Overall-Local Optical-Flow Estimation and Illumination Difference Investigating Author Research Relatedness through Crowdsourcing: A Replication Study on MTurk
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1