节能深度CNN加速器的异构分布式SRAM配置

Mehdi Ahmadi, S. Vakili, J. Langlois
{"title":"节能深度CNN加速器的异构分布式SRAM配置","authors":"Mehdi Ahmadi, S. Vakili, J. Langlois","doi":"10.1109/newcas49341.2020.9159814","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.","PeriodicalId":135163,"journal":{"name":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators\",\"authors\":\"Mehdi Ahmadi, S. Vakili, J. Langlois\",\"doi\":\"10.1109/newcas49341.2020.9159814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.\",\"PeriodicalId\":135163,\"journal\":{\"name\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/newcas49341.2020.9159814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/newcas49341.2020.9159814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

卷积神经网络(cnn)通常是视觉识别系统的首选,因为它具有很高的,甚至是超人的识别精度。CNN加速器的内存配置对其面积和能量效率影响很大,因此采用sram等片上存储器是不可避免的。sram可以通过在本地存储大量数据来减少耗能的DRAM访问次数。在本文中,我们提出了一种新的片上存储器配置,用于将某一类CNN加速器的存储器分为两组。第一组由浅而宽的ram组成,并行计算单元在其中积累中间结果。第二组包括在相邻计算单元之间共享的窄而深的ram,用于存储然后在不中断计算过程的情况下将最终结果传输到外部DRAM。实施结果表明,与使用普通乒乓结构进行SRAM-DRAM数据传输的设计相比,所提出的配置减少了21%的面积,提高了18%的能源效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators
Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neural Networks for Epileptic Seizure Prediction: Algorithms and Hardware Implementation Cascaded tunable distributed amplifiers for serial optical links: Some design rules Motor Task Learning in Brain Computer Interfaces using Time-Dependent Regularized Common Spatial Patterns and Residual Networks Towards GaN500-based High Temperature ICs: Characterization and Modeling up to 600°C A Current Reference with high Robustness to Process and Supply Voltage Variations unaffected by Body Effect upon Threshold Voltage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1