节能深度CNN加速器的异构分布式SRAM配置

2020 18th IEEE International New Circuits and Systems Conference (NEWCAS) Pub Date : 2020-06-01 DOI:10.1109/newcas49341.2020.9159814

Mehdi Ahmadi, S. Vakili, J. Langlois

{"title":"节能深度CNN加速器的异构分布式SRAM配置","authors":"Mehdi Ahmadi, S. Vakili, J. Langlois","doi":"10.1109/newcas49341.2020.9159814","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.","PeriodicalId":135163,"journal":{"name":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators\",\"authors\":\"Mehdi Ahmadi, S. Vakili, J. Langlois\",\"doi\":\"10.1109/newcas49341.2020.9159814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.\",\"PeriodicalId\":135163,\"journal\":{\"name\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/newcas49341.2020.9159814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/newcas49341.2020.9159814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络(cnn)通常是视觉识别系统的首选，因为它具有很高的，甚至是超人的识别精度。CNN加速器的内存配置对其面积和能量效率影响很大，因此采用sram等片上存储器是不可避免的。sram可以通过在本地存储大量数据来减少耗能的DRAM访问次数。在本文中，我们提出了一种新的片上存储器配置，用于将某一类CNN加速器的存储器分为两组。第一组由浅而宽的ram组成，并行计算单元在其中积累中间结果。第二组包括在相邻计算单元之间共享的窄而深的ram，用于存储然后在不中断计算过程的情况下将最终结果传输到外部DRAM。实施结果表明，与使用普通乒乓结构进行SRAM-DRAM数据传输的设计相比，所提出的配置减少了21%的面积，提高了18%的能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators

Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)

自引率

0.00%

发文量