AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators

IF 0.9 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS International Journal of Parallel Programming Pub Date : 2022-03-24 DOI:10.1007/s10766-022-00728-3
Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers
{"title":"AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators","authors":"Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers","doi":"10.1007/s10766-022-00728-3","DOIUrl":null,"url":null,"abstract":"<p>In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 5","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10766-022-00728-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DLA的巨大市场和大量关于DLA设计的论文表明,目前没有放之四海而皆准的解决方案。根据给定的优化目标(如功耗或性能),每个场景可能有几个最优解决方案。在设计周期中尽早找到这些解决方案的一种常用方法是使用分析模型,这种模型试图通过简单而深刻且足够准确的公式来描述设计。这项工作的主要贡献是用于估计dla上CNN执行时间的AI加速器通用分析模型(AMAIX)。它是基于流行的屋顶线模型。使用基于Synopsys ZeBu服务器的混合原型,通过NVDLA的RTL仿真验证了结果的性能预测。通过按照分而治之的模式对模型进行改进,AMAIX预测了AlexNet和LeNet在NVDLA上的推理时间,准确率达到98%。此外,这项工作展示了如何使用获得的结果进行根本原因分析,并作为设计空间探索的起点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Parallel Programming
International Journal of Parallel Programming 工程技术-计算机:理论方法
CiteScore
4.40
自引率
0.00%
发文量
15
审稿时长
>12 weeks
期刊介绍: International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. Such systems are characterized by the coexistence over time of multiple coordinated activities. The journal publishes both original research and survey papers. Fields of interest include: linguistic foundations, conceptual frameworks, high-level languages, evaluation methods, implementation techniques, programming support systems, pragmatic considerations, architectural characteristics, software engineering aspects, advances in parallel algorithms, performance studies, and application studies.
期刊最新文献
Meerkat: A Framework for Dynamic Graph Algorithms on GPUs Intelligent Page Migration on Heterogeneous Memory by Using Transformer Design and Performance Evaluation of a Novel High-Speed Hardware Architecture for Keccak Crypto Coprocessor RMOWOA: A Revamped Multi-Objective Whale Optimization Algorithm for Maximizing the Lifetime of a Network in Wireless Sensor Networks Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1