A smart memory architecture for the efficient support of artificial neural nets

K. Großpietsch, J. Büddefeld
{"title":"A smart memory architecture for the efficient support of artificial neural nets","authors":"K. Großpietsch, J. Büddefeld","doi":"10.1109/EMPDP.2001.905074","DOIUrl":null,"url":null,"abstract":"A \"smart memory\" approach is presented, i.e. the new architecture is achieved by extending the functionality of a conventional RAM structure. The architecture additionally contains two innovative features: To every word cell of w bits, a small q bits wide ALU is associated; and by means of extending the memory decoder, multiple access to certain sets of word cells within the memory as well as activation of their ALUs is possible. It is shown that based on these features, the standard numerical problem of adding up the m components of a vector of dimension m, in the new architecture can be carried out in a time complexity of O(square root(m)). For the execution of artificial neural nets, especially the on-line recognition of patterns mainly depends on the time-efficient efficient execution of weighted sums. It is shown that in our architecture, these weighted sums can be computed quite efficiently. The computation time is highly superior to the time complexity on sequential von Neumann machines. In addition, we show that if requested, the training mode of a neural net can also be significantly be speeded up. This is achieved by means of a simple crossbar switch which can be modularly added to the array of memory chips.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"269 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMPDP.2001.905074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

A "smart memory" approach is presented, i.e. the new architecture is achieved by extending the functionality of a conventional RAM structure. The architecture additionally contains two innovative features: To every word cell of w bits, a small q bits wide ALU is associated; and by means of extending the memory decoder, multiple access to certain sets of word cells within the memory as well as activation of their ALUs is possible. It is shown that based on these features, the standard numerical problem of adding up the m components of a vector of dimension m, in the new architecture can be carried out in a time complexity of O(square root(m)). For the execution of artificial neural nets, especially the on-line recognition of patterns mainly depends on the time-efficient efficient execution of weighted sums. It is shown that in our architecture, these weighted sums can be computed quite efficiently. The computation time is highly superior to the time complexity on sequential von Neumann machines. In addition, we show that if requested, the training mode of a neural net can also be significantly be speeded up. This is achieved by means of a simple crossbar switch which can be modularly added to the array of memory chips.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种有效支持人工神经网络的智能存储器架构
提出了一种“智能存储器”方法,即通过扩展传统RAM结构的功能来实现新架构。该架构还包含两个创新特性:对于每个w位的字单元,关联一个小的q位宽的ALU;通过扩展记忆解码器,可以多次访问记忆中的某些词单元集并激活它们的alu。结果表明,基于这些特征,在新的体系结构中,将m维向量的m个分量相加的标准数值问题可以在0(平方根(m))的时间复杂度内完成。对于人工神经网络的执行,特别是模式的在线识别,主要依赖于时间效率的加权和的高效执行。结果表明,在我们的体系结构中,这些加权和可以相当有效地计算出来。计算时间大大优于顺序冯·诺依曼机器的时间复杂度。此外,我们还表明,如果有要求,神经网络的训练模式也可以显著加快。这是通过一个简单的交叉开关来实现的,该开关可以模块化地添加到存储芯片阵列中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adding flexibility and real-time performance by adapting a single processor industrial application to a multiprocessor platform Heterogeneous matrix-matrix multiplication or partitioning a square into rectangles: NP-completeness and approximation algorithms Parallel simulated annealing for the delivery problem Implementing on-line techniques to allocate file resources in large distributed systems Visual data-parallel programming for signal processing applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1