节能加速器上CNN映射快速评估的自底向上方法

IF 1.6 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Low Power Electronics and Applications Pub Date : 2023-01-05 DOI:10.3390/jlpea13010005

Guillaume Devic, G. Sassatelli, A. Gamatie

{"title":"节能加速器上CNN映射快速评估的自底向上方法","authors":"Guillaume Devic, G. Sassatelli, A. Gamatie","doi":"10.3390/jlpea13010005","DOIUrl":null,"url":null,"abstract":"The execution of machine learning (ML) algorithms on resource-constrained embedded systems is very challenging in edge computing. To address this issue, ML accelerators are among the most efficient solutions. They are the result of aggressive architecture customization. Finding energy-efficient mappings of ML workloads on accelerators, however, is a very challenging task. In this paper, we propose a design methodology by combining different abstraction levels to quickly address the mapping of convolutional neural networks on ML accelerators. Starting from an open-source core adopting the RISC-V instruction set architecture, we define in RTL a more flexible and powerful multiply-and-accumulate (MAC) unit, compared to the native MAC unit. Our proposal contributes to improving the energy efficiency of the RISC-V cores of PULPino. To effectively evaluate its benefits at system level, while considering CNN execution, we build a corresponding analytical model in the Timeloop/Accelergy simulation and evaluation environment. This enables us to quickly explore CNN mappings on a typical RISC-V system-on-chip model, manufactured under the name of GAP8. The modeling flexibility offered by Timeloop makes it possible to easily evaluate our novel MAC unit in further CNN accelerator architectures such as Eyeriss and DianNao. Overall, the resulting bottom-up methodology assists designers in the efficient implementation of CNNs on ML accelerators by leveraging the accuracy and speed of the combined abstraction levels.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Bottom-Up Methodology for the Fast Assessment of CNN Mappings on Energy-Efficient Accelerators\",\"authors\":\"Guillaume Devic, G. Sassatelli, A. Gamatie\",\"doi\":\"10.3390/jlpea13010005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The execution of machine learning (ML) algorithms on resource-constrained embedded systems is very challenging in edge computing. To address this issue, ML accelerators are among the most efficient solutions. They are the result of aggressive architecture customization. Finding energy-efficient mappings of ML workloads on accelerators, however, is a very challenging task. In this paper, we propose a design methodology by combining different abstraction levels to quickly address the mapping of convolutional neural networks on ML accelerators. Starting from an open-source core adopting the RISC-V instruction set architecture, we define in RTL a more flexible and powerful multiply-and-accumulate (MAC) unit, compared to the native MAC unit. Our proposal contributes to improving the energy efficiency of the RISC-V cores of PULPino. To effectively evaluate its benefits at system level, while considering CNN execution, we build a corresponding analytical model in the Timeloop/Accelergy simulation and evaluation environment. This enables us to quickly explore CNN mappings on a typical RISC-V system-on-chip model, manufactured under the name of GAP8. The modeling flexibility offered by Timeloop makes it possible to easily evaluate our novel MAC unit in further CNN accelerator architectures such as Eyeriss and DianNao. Overall, the resulting bottom-up methodology assists designers in the efficient implementation of CNNs on ML accelerators by leveraging the accuracy and speed of the combined abstraction levels.\",\"PeriodicalId\":38100,\"journal\":{\"name\":\"Journal of Low Power Electronics and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Low Power Electronics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/jlpea13010005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea13010005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

在边缘计算中，在资源受限的嵌入式系统上执行机器学习算法是非常具有挑战性的。它们是积极的体系结构定制的结果。然而，在加速器上找到高效的ML工作负载映射是一项非常具有挑战性的任务。在本文中，我们提出了一种设计方法，通过结合不同的抽象层次来快速解决卷积神经网络在ML加速器上的映射。从采用RISC-V指令集架构的开源内核开始，我们在RTL中定义了一个比本地MAC单元更灵活、更强大的乘法和累加(MAC)单元。我们的建议有助于提高PULPino的RISC-V内核的能源效率。为了在系统层面有效地评估其效益，在考虑CNN执行的同时，我们在timelloop /Accelergy仿真和评估环境中建立了相应的分析模型。这使我们能够在典型的RISC-V片上系统模型上快速探索CNN映射，该模型以GAP8的名义制造。timelloop提供的建模灵活性使得在进一步的CNN加速器架构(如Eyeriss和DianNao)中轻松评估我们的新MAC单元成为可能。总的来说，由此产生的自下而上的方法通过利用组合抽象级别的准确性和速度，帮助设计人员在ML加速器上有效地实现cnn。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Bottom-Up Methodology for the Fast Assessment of CNN Mappings on Energy-Efficient Accelerators

The execution of machine learning (ML) algorithms on resource-constrained embedded systems is very challenging in edge computing. To address this issue, ML accelerators are among the most efficient solutions. They are the result of aggressive architecture customization. Finding energy-efficient mappings of ML workloads on accelerators, however, is a very challenging task. In this paper, we propose a design methodology by combining different abstraction levels to quickly address the mapping of convolutional neural networks on ML accelerators. Starting from an open-source core adopting the RISC-V instruction set architecture, we define in RTL a more flexible and powerful multiply-and-accumulate (MAC) unit, compared to the native MAC unit. Our proposal contributes to improving the energy efficiency of the RISC-V cores of PULPino. To effectively evaluate its benefits at system level, while considering CNN execution, we build a corresponding analytical model in the Timeloop/Accelergy simulation and evaluation environment. This enables us to quickly explore CNN mappings on a typical RISC-V system-on-chip model, manufactured under the name of GAP8. The modeling flexibility offered by Timeloop makes it possible to easily evaluate our novel MAC unit in further CNN accelerator architectures such as Eyeriss and DianNao. Overall, the resulting bottom-up methodology assists designers in the efficient implementation of CNNs on ML accelerators by leveraging the accuracy and speed of the combined abstraction levels.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊