Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Helen Li, Yiran Chen
{"title":"HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array","authors":"Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Helen Li, Yiran Chen","doi":"10.1109/HPCA.2019.00027","DOIUrl":null,"url":null,"abstract":"With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used in many domains. To achieve high performance and energy efficiency, hardware acceleration (especially inference) of DNNs is intensively studied both in academia and industry. However, we still face two challenges: large DNN models and datasets, which incur frequent off-chip memory accesses; and the training of DNNs, which is not well-explored in recent accelerator designs. To truly provide high throughput and energy efficient acceleration for the training of deep and large models, we inevitably need to use multiple accelerators to explore the coarse-grain parallelism, compared to the fine-grain parallelism inside a layer considered in most of the existing architectures. It poses the key research question to seek the best organization of computation and dataflow among accelerators. In this paper, inspired by recent work in machine learning systems, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators. HyPar partitions the feature map tensors (input and output), the kernel tensors, the gradient tensors, and the error tensors for the DNN accelerators. A partition constitutes the choice of parallelism for weighted layers. The optimization target is to search a partition that minimizes the total communication during training a complete DNN. To solve this problem, we propose a communication model to explain the source and amount of communications. Then, we use a hierarchical layer-wise dynamic programming method to search for the partition for each layer.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80

摘要

近年来,随着人工智能的兴起,深度神经网络(Deep Neural Networks, dnn)在许多领域得到了广泛的应用。为了实现高性能和节能,深度神经网络的硬件加速(特别是推理)在学术界和工业界都得到了广泛的研究。然而,我们仍然面临两个挑战:大型DNN模型和数据集,导致频繁的片外内存访问;以及深度神经网络的训练,这在最近的加速器设计中没有得到很好的探索。为了真正为深度和大型模型的训练提供高吞吐量和节能的加速,我们不可避免地需要使用多个加速器来探索粗粒度并行性,而不是在大多数现有架构中考虑的层内的细粒度并行性。寻求加速器间计算和数据流的最佳组织方式是研究的关键问题。在本文中,受最近机器学习系统工作的启发,我们提出了一种解决方案HyPar,以确定使用一系列DNN加速器进行深度神经网络训练的分层并行性。HyPar划分了特征映射张量(输入和输出)、核张量、梯度张量和DNN加速器的误差张量。划分构成加权层的并行度选择。优化的目标是在训练一个完整的深度神经网络的过程中,寻找一个使总通信最小化的分区。为了解决这个问题,我们提出了一个通信模型来解释通信的来源和数量。然后,我们使用分层分层的动态规划方法来搜索每一层的分区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used in many domains. To achieve high performance and energy efficiency, hardware acceleration (especially inference) of DNNs is intensively studied both in academia and industry. However, we still face two challenges: large DNN models and datasets, which incur frequent off-chip memory accesses; and the training of DNNs, which is not well-explored in recent accelerator designs. To truly provide high throughput and energy efficient acceleration for the training of deep and large models, we inevitably need to use multiple accelerators to explore the coarse-grain parallelism, compared to the fine-grain parallelism inside a layer considered in most of the existing architectures. It poses the key research question to seek the best organization of computation and dataflow among accelerators. In this paper, inspired by recent work in machine learning systems, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators. HyPar partitions the feature map tensors (input and output), the kernel tensors, the gradient tensors, and the error tensors for the DNN accelerators. A partition constitutes the choice of parallelism for weighted layers. The optimization target is to search a partition that minimizes the total communication during training a complete DNN. To solve this problem, we propose a communication model to explain the source and amount of communications. Then, we use a hierarchical layer-wise dynamic programming method to search for the partition for each layer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1