AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators

IF 0.9 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS International Journal of Parallel Programming Pub Date : 2022-03-24 DOI:10.1007/s10766-022-00728-3

Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers

{"title":"AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators","authors":"Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers","doi":"10.1007/s10766-022-00728-3","DOIUrl":null,"url":null,"abstract":"<p>In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"8 5","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10766-022-00728-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DLA的巨大市场和大量关于DLA设计的论文表明，目前没有放之四海而皆准的解决方案。根据给定的优化目标(如功耗或性能)，每个场景可能有几个最优解决方案。在设计周期中尽早找到这些解决方案的一种常用方法是使用分析模型，这种模型试图通过简单而深刻且足够准确的公式来描述设计。这项工作的主要贡献是用于估计dla上CNN执行时间的AI加速器通用分析模型(AMAIX)。它是基于流行的屋顶线模型。使用基于Synopsys ZeBu服务器的混合原型，通过NVDLA的RTL仿真验证了结果的性能预测。通过按照分而治之的模式对模型进行改进，AMAIX预测了AlexNet和LeNet在NVDLA上的推理时间，准确率达到98%。此外，这项工作展示了如何使用获得的结果进行根本原因分析，并作为设计空间探索的起点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Parallel Programming 工程技术-计算机：理论方法

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. Such systems are characterized by the coexistence over time of multiple coordinated activities. The journal publishes both original research and survey papers. Fields of interest include: linguistic foundations, conceptual frameworks, high-level languages, evaluation methods, implementation techniques, programming support systems, pragmatic considerations, architectural characteristics, software engineering aspects, advances in parallel algorithms, performance studies, and application studies.