无pu间模板的DNN加速器设计自动化硬件计算图

Jun Yu Li, Wei Wang, Wufeng Li
{"title":"无pu间模板的DNN加速器设计自动化硬件计算图","authors":"Jun Yu Li, Wei Wang, Wufeng Li","doi":"10.1145/3508352.3549342","DOIUrl":null,"url":null,"abstract":"Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"792 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates\",\"authors\":\"Jun Yu Li, Wei Wang, Wufeng Li\",\"doi\":\"10.1145/3508352.3549342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.\",\"PeriodicalId\":270592,\"journal\":{\"name\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"volume\":\"792 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508352.3549342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

现有的深度神经网络(DNN)加速器设计自动化(ADA)方法采用架构模板预先确定设计选择的部分,然后在模板之外探索剩余的设计选择。根据体系结构的不同,这些模板可以分为pu内模板和pu间模板。由于模板限制了ADA的灵活性,设计有效的无模板ADA方法已成为重要的研究课题。虽然已经出现了一些通过去除pu内模板来增强ADA灵活性的研究,但据我们所知,目前还没有研究没有pu间模板的ADA方法。具有预先确定的pu间模板的ADA在资源利用方面通常效率低下,特别是对于具有复杂拓扑结构的dnn。在本文中,我们提出了一种新的方法,称为硬件计算图(HCG),为ADA没有内部pu模板。实验表明,与现有最先进的ADA方法相比,HCG方法可以在只使用1.4 ~ 5倍片上内存的情况下实现竞争性延迟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates
Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications Numerically-Stable and Highly-Scalable Parallel LU Factorization for Circuit Simulation Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering Design and Technology Co-optimization Utilizing Multi-bit Flip-flop Cells
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1