eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application

Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao
{"title":"eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application","authors":"Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao","doi":"10.1109/ISQED57927.2023.10129307","DOIUrl":null,"url":null,"abstract":"In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 24th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED57927.2023.10129307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
eDRAM-OESP:一种新型的高性能嵌入式dram计算设计,用于边缘信号处理应用
内存计算(IMC)架构允许围绕内存阵列的算术和逻辑功能有效地利用内存带宽,并避免频繁地将数据移动到处理器。正如预期的那样,IMC架构带来了高吞吐量性能和显著的能源节约,这主要是由于将数据从内存移动到计算核心的工作负载更少。嵌入式DRAM (eDRAM)由1晶体管1电容(1T1C)位单元和逻辑块组成,具有节能和高性能的优点,有利于嵌入式计算引擎。该工作提出了一种新颖的eDRAM内计算设计,采用1T1C eDRAM单元,其位串行计算通过以交错方式排列操作数位,目标是3倍的吞吐量效率。交错式eDRAM架构能够同时从存储器单元中读取多个操作数的相应位,并且还允许在同一激活窗口中写回后计算,从而节省了多个预充电和激活周期。此外,交错结构允许对连续到达的数字化信号进行流水线处理。以1位加法器和多路复用器单元为形式的计算块针对不同的硬件指标(如延迟、功率和功率与延迟的乘积PDP)进行了优化,以采用符合规范的设计。基于edram的高效计算设计对1位加法器进行了评估,并进一步对8位和16位加法器、乘法器和不同滤波器尺寸的一维卷积进行了表征。与现有的最先进的工作相比,所提出的设计在16位加法和8位加法方面的计算时间分别提高了31%和30.6%。位串行edram计算设计实现了最佳性能,执行一维卷积操作的计算时间为2.5 ms,能量为120 nJ。嵌入式内存计算设计是为定制的实时边缘推理应用设计具有卷积神经网络(CNN)计算能力的嵌入式存储器的一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Metal Inter-layer Via Keep-out-zone in M3D IC: A Critical Process-aware Design Consideration HD2FPGA: Automated Framework for Accelerating Hyperdimensional Computing on FPGAs A Novel Stochastic LSTM Model Inspired by Quantum Machine Learning DC-Model: A New Method for Assisting the Analog Circuit Optimization Polynomial Formal Verification of a Processor: A RISC-V Case Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1