HIE-DRAM: High Performance Efficient In-DRAM Computing Architecture for SIMD

{"title":"HIE-DRAM: High Performance Efficient In-DRAM Computing Architecture for SIMD","authors":"Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao","doi":"10.1109/ISQED57927.2023.10129370","DOIUrl":null,"url":null,"abstract":"In-memory and near-memory computing allows for placing the processing elements around the periphery or inside the memory blocks. Performing the computation as soon as the data is made available in the memory sub-blocks avoids the need to wait for the processor to manage the data movement. The paper focuses on a new 11 transistor (11T) computing design and a novel operation with energy savings and performance improvement when compared to the current state-of-the-art (SOTA) available single-instruction- multiple-data in-DRAM (SIMDRAM) computing. The novel 11T pass transistor design is structured to offer logical AND, OR, XNOR and its complement operations. These are sequenced to generate desired operational output with a minuscule change of 4 row circuitry that corresponds to footprint expense of 0.05% when compared to the existing DRAM architecture. Based on these logical operations, 13 scalar instructions covering arithmetic, predication, reduction, and relational function types are characterized. These scalar operations is a mix of logarithmic, quadratic, and linear functions applied on either a single or multiple operand. With respect to single-instruction- multiple-data (SIMD) topology, vector operations comprising addition, multiplication, sparse multiplication, selection, unique, reduction, and prefix summation are also realized. All these operations were compared with the current SOTA SIMDRAM architectural design to showcase profound computing time benefits and energy savings. The proposed 11T in-DRAM-compute design offers 5.18% to 50.57% improvement in computing latency and energy across 10 scalar operations, over SIMDRAM architecture. The novel high performance and efficient in-DRAM computing (HIE-DRAM) implementation is a step towards utilizing real-time in-memory vector data processing for autonomous applications.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 24th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED57927.2023.10129370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In-memory and near-memory computing allows for placing the processing elements around the periphery or inside the memory blocks. Performing the computation as soon as the data is made available in the memory sub-blocks avoids the need to wait for the processor to manage the data movement. The paper focuses on a new 11 transistor (11T) computing design and a novel operation with energy savings and performance improvement when compared to the current state-of-the-art (SOTA) available single-instruction- multiple-data in-DRAM (SIMDRAM) computing. The novel 11T pass transistor design is structured to offer logical AND, OR, XNOR and its complement operations. These are sequenced to generate desired operational output with a minuscule change of 4 row circuitry that corresponds to footprint expense of 0.05% when compared to the existing DRAM architecture. Based on these logical operations, 13 scalar instructions covering arithmetic, predication, reduction, and relational function types are characterized. These scalar operations is a mix of logarithmic, quadratic, and linear functions applied on either a single or multiple operand. With respect to single-instruction- multiple-data (SIMD) topology, vector operations comprising addition, multiplication, sparse multiplication, selection, unique, reduction, and prefix summation are also realized. All these operations were compared with the current SOTA SIMDRAM architectural design to showcase profound computing time benefits and energy savings. The proposed 11T in-DRAM-compute design offers 5.18% to 50.57% improvement in computing latency and energy across 10 scalar operations, over SIMDRAM architecture. The novel high performance and efficient in-DRAM computing (HIE-DRAM) implementation is a step towards utilizing real-time in-memory vector data processing for autonomous applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
hi - dram: SIMD的高效内dram计算架构
内存和近内存计算允许将处理元素放置在外围或内存块内部。一旦数据在内存子块中可用,就立即执行计算,从而避免了等待处理器管理数据移动的需要。本文重点介绍了一种新的11晶体管(11T)计算设计和一种与当前最先进(SOTA)可用的单指令多数据dram (SIMDRAM)计算相比节能和性能改进的新操作。新颖的11T通管设计结构提供逻辑与,或,异或及其补充操作。与现有的DRAM体系结构相比,只需对4行电路进行微小的改变,即可产生所需的操作输出,这相当于占用空间费用的0.05%。基于这些逻辑操作,描述了13个标量指令,包括算术、预测、约简和关系函数类型。这些标量运算是应用于单个或多个操作数的对数、二次和线性函数的混合。对于单指令多数据(SIMD)拓扑,还实现了加法、乘法、稀疏乘法、选择、唯一、约简和前缀求和等向量运算。所有这些操作都与当前SOTA SIMDRAM架构设计进行了比较,以展示深刻的计算时间优势和节能效果。与SIMDRAM架构相比,所提出的11T in- dram计算设计在跨10个标量操作的计算延迟和能量方面提高了5.18%至50.57%。这种新型的高性能和高效的dram内计算(HIE-DRAM)实现是朝着利用实时内存矢量数据处理自主应用迈出的一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture
IF 1.6 3区 计算机科学ACM Transactions on Architecture and Code OptimizationPub Date : 2024-06-14 DOI: 10.1145/3673653
Ataberk Olgun, Fatma Bostanci, Geraldo Francisco de Oliveira Junior, Yahya Can Tugrul, Rahul Bera, Abdullah Giray Yaglikci, Hasan Hassan, Oguz Ergin, Onur Mutlu
High-performance embedded SOI DRAM architecture for the low-power supply
IF 5.4 1区 工程技术IEEE Journal of Solid-state CircuitsPub Date : 2000-08-01 DOI: 10.1109/4.859506
T. Yamauchi;F. Morisita;S. Maeda;K. Arimoto;K. Fujishima;H. Ozaki;T. Yoshihara
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Metal Inter-layer Via Keep-out-zone in M3D IC: A Critical Process-aware Design Consideration HD2FPGA: Automated Framework for Accelerating Hyperdimensional Computing on FPGAs A Novel Stochastic LSTM Model Inspired by Quantum Machine Learning DC-Model: A New Method for Assisting the Analog Circuit Optimization Polynomial Formal Verification of a Processor: A RISC-V Case Study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1