利用管道模型并行性优化 DNN 训练,提高嵌入式系统性能

IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-04-06 DOI:10.1016/j.jpdc.2024.104890
Md Al Maruf , Akramul Azim , Nitin Auluck , Mansi Sahi
{"title":"利用管道模型并行性优化 DNN 训练,提高嵌入式系统性能","authors":"Md Al Maruf ,&nbsp;Akramul Azim ,&nbsp;Nitin Auluck ,&nbsp;Mansi Sahi","doi":"10.1016/j.jpdc.2024.104890","DOIUrl":null,"url":null,"abstract":"<div><p>Deep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits DNN adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal DNN model partitions and distributing them efficiently to achieve improved performance.</p><p>This paper proposes a DNN model parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of DNNs for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"190 ","pages":"Article 104890"},"PeriodicalIF":3.4000,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000546/pdfft?md5=d1af7342dc4b7d20a8dac857da5813c8&pid=1-s2.0-S0743731524000546-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems\",\"authors\":\"Md Al Maruf ,&nbsp;Akramul Azim ,&nbsp;Nitin Auluck ,&nbsp;Mansi Sahi\",\"doi\":\"10.1016/j.jpdc.2024.104890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Deep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits DNN adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal DNN model partitions and distributing them efficiently to achieve improved performance.</p><p>This paper proposes a DNN model parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of DNNs for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.</p></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"190 \",\"pages\":\"Article 104890\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0743731524000546/pdfft?md5=d1af7342dc4b7d20a8dac857da5813c8&pid=1-s2.0-S0743731524000546-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524000546\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524000546","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络(DNN)因其卓越的性能在不同领域的应用中获得了广泛的青睐。尽管大规模并行多核处理器架构已经普及,但在嵌入式系统中采用大型 DNN 模型仍然具有挑战性,因为大多数嵌入式应用在设计时都考虑到了单核处理器。这限制了 DNN 在嵌入式系统中的应用,原因是模型并行化和工作负载分区的利用效率不高。先前的解决方案试图利用数据和模型并行化来应对这些挑战。本文提出了 DNN 模型并行化框架,通过寻找最佳的模型分区数量和资源供应来加速模型训练。所提出的框架结合了数据和模型并行技术,优化了嵌入式应用中 DNN 的并行处理。此外,它还实现了分区模型的流水线执行,并集成了一个任务控制器来管理计算资源。图像对象检测的实验结果表明,与基线 AlexNet 卷积神经网络 (CNN) 模型相比,我们提出的框架可估算最新执行时间,并将整体模型训练时间减少近 44.87%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems

Deep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits DNN adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal DNN model partitions and distributing them efficiently to achieve improved performance.

This paper proposes a DNN model parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of DNNs for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing 工程技术-计算机:理论方法
CiteScore
10.30
自引率
2.60%
发文量
172
审稿时长
12 months
期刊介绍: This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.
期刊最新文献
Enabling semi-supervised learning in intrusion detection systems Fault-tolerance in biswapped multiprocessor interconnection networks Editorial Board Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) Design and experimental evaluation of algorithms for optimizing the throughput of dispersed computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1