eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application

2023 24th International Symposium on Quality Electronic Design (ISQED) Pub Date : 2023-04-05 DOI:10.1109/ISQED57927.2023.10129307

Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao

{"title":"eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application","authors":"Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao","doi":"10.1109/ISQED57927.2023.10129307","DOIUrl":null,"url":null,"abstract":"In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 24th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED57927.2023.10129307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

eDRAM-OESP:一种新型的高性能嵌入式dram计算设计，用于边缘信号处理应用

内存计算(IMC)架构允许围绕内存阵列的算术和逻辑功能有效地利用内存带宽，并避免频繁地将数据移动到处理器。正如预期的那样，IMC架构带来了高吞吐量性能和显著的能源节约，这主要是由于将数据从内存移动到计算核心的工作负载更少。嵌入式DRAM (eDRAM)由1晶体管1电容(1T1C)位单元和逻辑块组成，具有节能和高性能的优点，有利于嵌入式计算引擎。该工作提出了一种新颖的eDRAM内计算设计，采用1T1C eDRAM单元，其位串行计算通过以交错方式排列操作数位，目标是3倍的吞吐量效率。交错式eDRAM架构能够同时从存储器单元中读取多个操作数的相应位，并且还允许在同一激活窗口中写回后计算，从而节省了多个预充电和激活周期。此外，交错结构允许对连续到达的数字化信号进行流水线处理。以1位加法器和多路复用器单元为形式的计算块针对不同的硬件指标(如延迟、功率和功率与延迟的乘积PDP)进行了优化，以采用符合规范的设计。基于edram的高效计算设计对1位加法器进行了评估，并进一步对8位和16位加法器、乘法器和不同滤波器尺寸的一维卷积进行了表征。与现有的最先进的工作相比，所提出的设计在16位加法和8位加法方面的计算时间分别提高了31%和30.6%。位串行edram计算设计实现了最佳性能，执行一维卷积操作的计算时间为2.5 ms，能量为120 nJ。嵌入式内存计算设计是为定制的实时边缘推理应用设计具有卷积神经网络(CNN)计算能力的嵌入式存储器的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 24th International Symposium on Quality Electronic Design (ISQED)

自引率

0.00%

发文量