DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning

Matthijs Jansen, V. Codreanu, A. Varbanescu
{"title":"DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning","authors":"Matthijs Jansen, V. Codreanu, A. Varbanescu","doi":"10.1109/DLS51937.2020.00009","DOIUrl":null,"url":null,"abstract":"Due to its many applications across various fields of research, engineering, and daily life, deep learning has seen a surge in popularity. Therefore, larger and more expressive models have been proposed, with examples like Turing-NLG using as many as 17 billion parameters. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint. Therefore, several approaches for distributed training based on data parallelism (e.g., Horovod) and model/pipeline parallelism (e.g., GPipe, PipeDream) have emerged. In this work, we focus on an in-depth comparison of three different parallelism models that address these needs: data, model and pipeline parallelism. To this end, we provide an analytical comparison of the three, both in terms of computation time and memory usage, and introduce DDLBench, a comprehensive (open-source1, ready-to-use) benchmark suite to quantify these differences in practice. Through in-depth performance analysis and experimentation with various models, datasets, distribution models and hardware systems, we demonstrate that DDLBench can accurately quantify the capability of a given system to perform distributed deep learning (DDL). By comparing our analytical models with the benchmarking results, we show how the performance of real-life implementations diverges from these analytical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.1https://github.com/sara-nl/DDLBench","PeriodicalId":185533,"journal":{"name":"2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DLS51937.2020.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Due to its many applications across various fields of research, engineering, and daily life, deep learning has seen a surge in popularity. Therefore, larger and more expressive models have been proposed, with examples like Turing-NLG using as many as 17 billion parameters. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint. Therefore, several approaches for distributed training based on data parallelism (e.g., Horovod) and model/pipeline parallelism (e.g., GPipe, PipeDream) have emerged. In this work, we focus on an in-depth comparison of three different parallelism models that address these needs: data, model and pipeline parallelism. To this end, we provide an analytical comparison of the three, both in terms of computation time and memory usage, and introduce DDLBench, a comprehensive (open-source1, ready-to-use) benchmark suite to quantify these differences in practice. Through in-depth performance analysis and experimentation with various models, datasets, distribution models and hardware systems, we demonstrate that DDLBench can accurately quantify the capability of a given system to perform distributed deep learning (DDL). By comparing our analytical models with the benchmarking results, we show how the performance of real-life implementations diverges from these analytical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.1https://github.com/sara-nl/DDLBench
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DDLBench:面向分布式深度学习的可扩展基准基础设施
由于其在各个研究、工程和日常生活领域的许多应用,深度学习的受欢迎程度激增。因此,更大、更具表现力的模型被提出,像图灵- nlg这样的例子使用了多达170亿个参数。由于高计算成本和大内存占用,训练这些非常大的模型变得越来越困难。因此,出现了几种基于数据并行性(如Horovod)和模型/管道并行性(如GPipe、PipeDream)的分布式训练方法。在这项工作中,我们将重点对三种不同的并行模型进行深入的比较,以满足这些需求:数据并行、模型并行和管道并行。为此,我们从计算时间和内存使用两方面对这三者进行了分析比较,并介绍了DDLBench,这是一个全面的(开源1,即用型)基准测试套件,可以在实践中量化这些差异。通过对各种模型、数据集、分布模型和硬件系统进行深入的性能分析和实验,我们证明了DDLBench可以准确地量化给定系统执行分布式深度学习(DDL)的能力。通过比较我们的分析模型和基准测试结果,我们展示了实际实现的性能如何偏离这些分析模型,因此需要基准测试来捕获框架本身的深度复杂性
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Online-Codistillation Meets LARS, Going beyond the Limit of Data Parallelism in Deep Learning Vandermonde Wave Function Ansatz for Improved Variational Monte Carlo TopiQAL: Topic-aware Question Answering using Scalable Domain-specific Supercomputers DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning [Copyright notice]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1