MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova
{"title":"MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova","doi":"10.1109/IPDPSW55747.2022.00174","DOIUrl":null,"url":null,"abstract":"The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
管道模型并行的内存感知动态规划算法
深度神经网络(dnn)的训练阶段是非常计算密集的,现在经常在并行计算平台上执行,从几个gpu到几千个gpu。训练并行化的选择策略是所谓的数据并行方法,该方法基于不同输入(通常是图像)的并行训练和具有集体通信(AllReduce操作)的网络权重聚合。这种方法的可伸缩性受到每个节点上可用内存和集体操作的网络容量的限制。最近,提出了一种并行模型方法(PipeDream, Gpipe),其中DNN权重分布,图像以管道/流的方式在计算节点上进行训练。在本文中,我们详细形式化了当使用管道模型并行时与DNN层放置到计算资源相关的优化问题,并且我们推导了一个基于动态规划的启发式算法MadPipe。我们通过基于现实网络的广泛模拟表明,与PipeDream相比,MadPipe显着提高了流水线并行模型方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
(CGRA4HPC) 2022 Invited Speaker: Pushing the Boundaries of HPC with the Integration of AI Moving from Composable to Programmable Energy-aware neural architecture selection and hyperparameter optimization Smoothing on Dynamic Concurrency Throttling An Analysis of Mapping Polybench Kernels to HPC CGRAs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1