{"title":"Heterogeneous Distributed SRAM Configuration for Energy-Efficient Deep CNN Accelerators","authors":"Mehdi Ahmadi, S. Vakili, J. Langlois","doi":"10.1109/newcas49341.2020.9159814","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.","PeriodicalId":135163,"journal":{"name":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 18th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/newcas49341.2020.9159814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional Neural Networks (CNNs) are often the first choice for visual recognition systems due to their high, even superhuman, recognition accuracy. The memory configuration of CNN accelerators highly impacts their area and energy efficiency, and employing on-chip memories such as SRAMs is unavoidable. SRAMs can reduce the number of energy-hungry DRAM accesses by storing a large amount of data locally. In this paper, we propose a new on-chip memory configuration, for a certain class of CNN accelerators that divides the memories into two groups. The first group consists of shallow but wide SRAMs into which parallel computational units accumulate intermediate results. The second group includes narrow but deep SRAMs shared between adjacent computational units to store then transfer final results to the external DRAM without interrupting the computation process. Implementation results show that the proposed configuration reduces the area by 21 % and improves the energy efficiency by 18% compared to designs which use an ordinary ping-pong structure for SRAM-DRAM data transfer.