利用模拟非易失性突触器件实现高原位训练精度和能量效率

Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu
{"title":"利用模拟非易失性突触器件实现高原位训练精度和能量效率","authors":"Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu","doi":"10.1145/3500929","DOIUrl":null,"url":null,"abstract":"On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"121 1","pages":"37:1-37:19"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices\",\"authors\":\"Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu\",\"doi\":\"10.1145/3500929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.\",\"PeriodicalId\":7063,\"journal\":{\"name\":\"ACM Trans. Design Autom. Electr. Syst.\",\"volume\":\"121 1\",\"pages\":\"37:1-37:19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Design Autom. Electr. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3500929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Design Autom. Electr. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3500929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

设备内嵌入式人工智能在现场部署时更倾向于自适应学习能力,因此需要现场培训。利用存储器阵列内模拟计算的内存计算方法是实现深度神经网络(DNN)片上加速的一种很有前途的解决方案。新兴的非易失性存储器由于其多层可编程性而被用作模拟突触,引起了人们的极大兴趣。然而,电导调谐中的不对称性和非线性仍然是实现高原位训练精度的巨大挑战。此外,存储阵列边缘的模数转换器会引入量化误差。在这项工作中,我们提出了一种算法-硬件协同优化来克服这些挑战。我们将器件/电路非理想效应纳入深度神经网络传播和权值更新步骤中。通过在权重更新规则中引入自适应“动量”,即使在严重的不对称/非线性和模数转换器量化误差下,CIFAR-10的原位训练精度也能接近其软件基线。还评估了片上训练架构的硬件性能和增加“动量”的开销。通过对反向传播数据流的优化,达到23.59 TOPS/W的训练能量效率(比naïve数据流提高12倍)。处理“动量”的电路只引入了4.2%的能量开销。我们的研究结果显示了巨大的潜力和更宽松的要求,使嵌入式人工智能平台上DNN加速的非易失性存储器成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices
On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices A Comprehensive Survey of Attacks without Physical Access Targeting Hardware Vulnerabilities in IoT/IIoT Devices, and Their Detection Mechanisms Improving LDPC Decoding Performance for 3D TLC NAND Flash by LLR Optimization Scheme for Hard and Soft Decision Demand-Driven Multi-Target Sample Preparation on Resource-Constrained Digital Microfluidic Biochips
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1