基于 Chiplet 的加速器上多 DNN 工作负载的多目标硬件映射协同优化

IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computers Pub Date : 2024-04-10 DOI:10.1109/TC.2024.3386067
Abhijit Das;Enrico Russo;Maurizio Palesi
{"title":"基于 Chiplet 的加速器上多 DNN 工作负载的多目标硬件映射协同优化","authors":"Abhijit Das;Enrico Russo;Maurizio Palesi","doi":"10.1109/TC.2024.3386067","DOIUrl":null,"url":null,"abstract":"The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to \n<inline-formula><tex-math>$96\\%$</tex-math></inline-formula>\n and energy by up to \n<inline-formula><tex-math>$96.12\\%$</tex-math></inline-formula>\n.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"1883-1898"},"PeriodicalIF":3.6000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators\",\"authors\":\"Abhijit Das;Enrico Russo;Maurizio Palesi\",\"doi\":\"10.1109/TC.2024.3386067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to \\n<inline-formula><tex-math>$96\\\\%$</tex-math></inline-formula>\\n and energy by up to \\n<inline-formula><tex-math>$96.12\\\\%$</tex-math></inline-formula>\\n.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"73 8\",\"pages\":\"1883-1898\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10496454/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10496454/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

由于需要在同一计算平台上高效执行不同的深度神经网络 (DNN),同时还要求易于扩展,因此基于多芯片模块 (MCM) 的加速器成为首选设计方案。这种加速器以芯片的形式汇集了异构子加速器,并通过包上网络(NoP)相互连接。本文要解决的难题是:选择最合适的子加速器、配置它们、确定它们在 NoP 中的最佳位置,以及在空间和时间上映射一组预定 DNN 的层。目标是在并行执行过程中最大限度地减少执行时间和能耗,同时最大限度地降低总体成本,特别是加速器的硅面积。本文介绍了 MOHaM,一个在基于芯片组的加速器上针对多 DNN 工作负载进行多目标硬件映射协同优化的框架。MOHaM 采用了一种多目标进化算法,该算法结合了多个定制的遗传算子,专门用于解决给定的问题。在不同的多 DNN 工作负载场景中,MOHaM 与最先进的设计空间探索(DSE)框架进行了对比评估。与最先进的方案相比,MOHaM 发现的解决方案是帕累托最优方案。具体来说,MOHaM生成的加速器设计最多可减少96美元的延迟和96.12美元的能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators
The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to $96\%$ and energy by up to $96.12\%$ .
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Computers
IEEE Transactions on Computers 工程技术-工程:电子与电气
CiteScore
6.60
自引率
5.40%
发文量
199
审稿时长
6.0 months
期刊介绍: The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.
期刊最新文献
CUSPX: Efficient GPU Implementations of Post-Quantum Signature SPHINCS+ Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning FLALM: A Flexible Low Area-Latency Montgomery Modular Multiplication on FPGA Novel Lagrange Multipliers-Driven Adaptive Offloading for Vehicular Edge Computing Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1