PipeDream: generalized pipeline parallelism for DNN training

D. Narayanan, A. Harlap, Amar Phanishayee, V. Seshadri, Nikhil R. Devanur, G. Ganger, Phillip B. Gibbons, M. Zaharia
{"title":"PipeDream: generalized pipeline parallelism for DNN training","authors":"D. Narayanan, A. Harlap, Amar Phanishayee, V. Seshadri, Nikhil R. Devanur, G. Ganger, Phillip B. Gibbons, M. Zaharia","doi":"10.1145/3341301.3359646","DOIUrl":null,"url":null,"abstract":"DNN training is extremely time-consuming, necessitating efficient multi-accelerator parallelization. Current approaches to parallelizing training primarily use intra-batch parallelization, where a single iteration of training is split over the available workers, but suffer from diminishing returns at higher worker counts. We present PipeDream, a system that adds inter-batch pipelining to intra-batch parallelism to further improve parallel training throughput, helping to better overlap computation with communication and reduce the amount of communication when possible. Unlike traditional pipelining, DNN training is bi-directional, where a forward pass through the computation graph is followed by a backward pass that uses state and intermediate data computed during the forward pass. Naïve pipelining can thus result in mismatches in state versions used in the forward and backward passes, or excessive pipeline flushes and lower hardware efficiency. To address these challenges, PipeDream versions model parameters for numerically correct gradient computations, and schedules forward and backward passes of different minibatches concurrently on different workers with minimal pipeline stalls. PipeDream also automatically partitions DNN layers among workers to balance work and minimize communication. Extensive experimentation with a range of DNN tasks, models, and hardware configurations shows that PipeDream trains models to high accuracy up to 5.3X faster than commonly used intra-batch parallelism techniques.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"520","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341301.3359646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 520

Abstract

DNN training is extremely time-consuming, necessitating efficient multi-accelerator parallelization. Current approaches to parallelizing training primarily use intra-batch parallelization, where a single iteration of training is split over the available workers, but suffer from diminishing returns at higher worker counts. We present PipeDream, a system that adds inter-batch pipelining to intra-batch parallelism to further improve parallel training throughput, helping to better overlap computation with communication and reduce the amount of communication when possible. Unlike traditional pipelining, DNN training is bi-directional, where a forward pass through the computation graph is followed by a backward pass that uses state and intermediate data computed during the forward pass. Naïve pipelining can thus result in mismatches in state versions used in the forward and backward passes, or excessive pipeline flushes and lower hardware efficiency. To address these challenges, PipeDream versions model parameters for numerically correct gradient computations, and schedules forward and backward passes of different minibatches concurrently on different workers with minimal pipeline stalls. PipeDream also automatically partitions DNN layers among workers to balance work and minimize communication. Extensive experimentation with a range of DNN tasks, models, and hardware configurations shows that PipeDream trains models to high accuracy up to 5.3X faster than commonly used intra-batch parallelism techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PipeDream: DNN训练的广义管道并行
深度神经网络训练非常耗时,需要高效的多加速器并行化。当前并行化训练的方法主要使用批内并行化,其中训练的单个迭代被分配到可用的工人上,但是在较高的工人数量上遭受收益递减的困扰。我们提出了PipeDream系统,该系统在批内并行的基础上增加了批间流水线,以进一步提高并行训练吞吐量,帮助更好地重叠计算和通信,并在可能的情况下减少通信量。与传统的流水线不同,DNN训练是双向的,其中向前传递计算图,然后是向后传递,使用在向前传递期间计算的状态和中间数据。因此,Naïve管道可能导致在向前和向后传递中使用的状态版本不匹配,或者过多的管道刷新和较低的硬件效率。为了应对这些挑战,PipeDream版本为数值正确的梯度计算建模参数,并以最小的管道延迟同时在不同的工人上安排不同小批量的向前和向后传递。PipeDream还自动在工作人员之间划分DNN层,以平衡工作并减少通信。对一系列DNN任务、模型和硬件配置的广泛实验表明,PipeDream训练模型的精度比常用的批内并行技术快5.3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TASO Gerenuk The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure Yodel I4
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1