Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao
{"title":"eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application","authors":"Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao","doi":"10.1109/ISQED57927.2023.10129307","DOIUrl":null,"url":null,"abstract":"In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 24th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED57927.2023.10129307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.