40nm CMOS 141mW 898GOPS稀疏神经形态处理器

Phil C. Knag, Chester Liu, Zhengya Zhang
{"title":"40nm CMOS 141mW 898GOPS稀疏神经形态处理器","authors":"Phil C. Knag, Chester Liu, Zhengya Zhang","doi":"10.1109/VLSIC.2016.7573526","DOIUrl":null,"url":null,"abstract":"Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.","PeriodicalId":6512,"journal":{"name":"2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)","volume":"16 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A 1.40mm2 141mW 898GOPS sparse neuromorphic processor in 40nm CMOS\",\"authors\":\"Phil C. Knag, Chester Liu, Zhengya Zhang\",\"doi\":\"10.1109/VLSIC.2016.7573526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.\",\"PeriodicalId\":6512,\"journal\":{\"name\":\"2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)\",\"volume\":\"16 1\",\"pages\":\"1-2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VLSIC.2016.7573526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSIC.2016.7573526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

稀疏性是一种受大脑启发的特性,它可以显著减少深度学习的工作量和功耗。这项工作提出了一个1.40mm2 40nm的CMOS稀疏神经形态处理器,该处理器实现了用于推理的两层卷积受限玻尔兹曼机(CRBM)和支持向量机(SVM)分类器。该处理器采用稀疏卷积来实现稀疏比例的工作量减少。该架构沿着非稀疏维度并行化,以最小化延迟。在0.9V和240MHz下,处理器实现了898.2GOPS的有效性能,功耗为140.9mW。通过使用稀疏性,我们将工作负载、数据路径功耗和面积分别降低了3.4倍、3.3倍和1.74倍。该设计采用基于锁存的存储器来减小内存面积,并采用动态时钟门控来节省功耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A 1.40mm2 141mW 898GOPS sparse neuromorphic processor in 40nm CMOS
Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A chopping switched-capacitor RF receiver with integrated blocker detection, +31dBm OB-IIP3, and +15dBm OB-B1dB A wireless power transfer system with enhanced response and efficiency by fully-integrated fast-tracking wireless constant-idle-time control for implants Adaptive clocking with dynamic power gating for mitigating energy efficiency & performance impacts of fast voltage droop in a 22nm graphics execution core A high-density CMOS multi-modality joint sensor/stimulator array with 1024 pixels for holistic real-time cellular characterization A microelectrode array with 8,640 electrodes enabling simultaneous full-frame readout at 6.5 kfps and 112-channel switch-matrix readout at 20 kS/s
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1