一个基于二进制可扩展,8.80 TOPS/W的CNN加速器,具有多核内存处理架构,具有896K突触/mm2

S. Okumura, M. Yabuuchi, K. Hijioka, Koichi Nose
{"title":"一个基于二进制可扩展,8.80 TOPS/W的CNN加速器,具有多核内存处理架构,具有896K突触/mm2","authors":"S. Okumura, M. Yabuuchi, K. Hijioka, Koichi Nose","doi":"10.23919/VLSIT.2019.8776544","DOIUrl":null,"url":null,"abstract":"A Processing-In-Memory (PIM) accelerator with ternary SRAM is proposed for low-power, large-scale deep neural network (DNN) processing. The accelerator consists of Ternary Neural Arithmetic Memory (TNAM) which is capable of bit-scalable MAC (multiply and accumulation) operation in accordance with target accuracy and power limit. An ADC less readout circuits to reduce analog-digital conversion power and a system-level variation avoidance technique utilizing features of TNAM are also proposed. A test chip with large-scale PIM is fabricated and successfully operate convolutional neural networks (CNNs) with 8.8TOPS/W and highest accuracy and area density among recent SRAM-type PIMs are obtained.","PeriodicalId":6752,"journal":{"name":"2019 Symposium on VLSI Technology","volume":"336 1","pages":"C248-C249"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2\",\"authors\":\"S. Okumura, M. Yabuuchi, K. Hijioka, Koichi Nose\",\"doi\":\"10.23919/VLSIT.2019.8776544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A Processing-In-Memory (PIM) accelerator with ternary SRAM is proposed for low-power, large-scale deep neural network (DNN) processing. The accelerator consists of Ternary Neural Arithmetic Memory (TNAM) which is capable of bit-scalable MAC (multiply and accumulation) operation in accordance with target accuracy and power limit. An ADC less readout circuits to reduce analog-digital conversion power and a system-level variation avoidance technique utilizing features of TNAM are also proposed. A test chip with large-scale PIM is fabricated and successfully operate convolutional neural networks (CNNs) with 8.8TOPS/W and highest accuracy and area density among recent SRAM-type PIMs are obtained.\",\"PeriodicalId\":6752,\"journal\":{\"name\":\"2019 Symposium on VLSI Technology\",\"volume\":\"336 1\",\"pages\":\"C248-C249\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Symposium on VLSI Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/VLSIT.2019.8776544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Symposium on VLSI Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSIT.2019.8776544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

摘要

提出了一种基于三元SRAM的内存处理(PIM)加速器,用于低功耗、大规模深度神经网络(DNN)处理。该加速器由三元神经算术存储器(TNAM)组成,能够根据目标精度和功率限制进行位扩展的MAC(乘法和累加)操作。本文还提出了一种减少模数转换功率的无ADC读出电路和一种利用TNAM特性的系统级变差避免技术。制作了一个大规模PIM测试芯片,成功运行了卷积神经网络(cnn),其精度为8.8TOPS/W,是近年来sram型PIM中精度和面积密度最高的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2
A Processing-In-Memory (PIM) accelerator with ternary SRAM is proposed for low-power, large-scale deep neural network (DNN) processing. The accelerator consists of Ternary Neural Arithmetic Memory (TNAM) which is capable of bit-scalable MAC (multiply and accumulation) operation in accordance with target accuracy and power limit. An ADC less readout circuits to reduce analog-digital conversion power and a system-level variation avoidance technique utilizing features of TNAM are also proposed. A test chip with large-scale PIM is fabricated and successfully operate convolutional neural networks (CNNs) with 8.8TOPS/W and highest accuracy and area density among recent SRAM-type PIMs are obtained.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Economics of semiconductor scaling - a cost analysis for advanced technology node Transient Negative Capacitance as Cause of Reverse Drain-induced Barrier Lowering and Negative Differential Resistance in Ferroelectric FETs Confined PCM-based Analog Synaptic Devices offering Low Resistance-drift and 1000 Programmable States for Deep Learning High Performance Heterogeneous Integration on Fan-out RDL Interposer Technology challenges and enablers to extend Cu metallization to beyond 7 nm node
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1