7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

Jinook Song, Yun-Jin Cho, Jun-Seok Park, Jun-Woo Jang, Sehwan Lee, Joonho Song, Jae-Gon Lee, Inyup Kang
{"title":"7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC","authors":"Jinook Song, Yun-Jin Cho, Jun-Seok Park, Jun-Woo Jang, Sehwan Lee, Joonho Song, Jae-Gon Lee, Inyup Kang","doi":"10.1109/ISSCC.2019.8662476","DOIUrl":null,"url":null,"abstract":"Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2–5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.","PeriodicalId":265551,"journal":{"name":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"81","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.2019.8662476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 81

Abstract

Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2–5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
7.1 8.5 tops /W 1024-MAC蝴蝶结构双核稀疏感知神经处理单元的8nm旗舰移动SoC
深度学习已广泛应用于图像和语音识别。响应时间、连接性、隐私和安全性促使应用程序转向移动平台,而不是云。对于移动片上系统(soc),节能神经处理单元(NPU)已被研究用于在深度神经网络中执行卷积层(cl)和全连接层(fcl)[2-5]。此外,考虑到神经网络越来越深入,NPU需要集成1K甚至更多的乘法/累积(MAC)单元。为了提高能量效率,已经研究了神经网络的压缩,在不损失精度的情况下,以8b甚至更低的不动点精度修剪神经连接,量化权值和特征[1]。硬件加速器利用网络稀疏性来提高MAC单元的利用率[3]。然而,由于预测哪里可能进行修剪是具有挑战性的,因此加速器需要复杂的电路来选择与非零权重数组相对应的特征数组。为了降低MAC操作的功耗,采用了位串行乘法器[5]。一般来说,极低或可变位精度的神经网络需要仔细训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
27.2 An Adiabatic Sense and Set Rectifier for Improved Maximum-Power-Point Tracking in Piezoelectric Harvesting with 541% Energy Extraction Gain 22.7 A Programmable Wireless EEG Monitoring SoC with Open/Closed-Loop Optogenetic and Electrical Stimulation for Epilepsy Control 2.5 A 40×40 Four-Neighbor Time-Based In-Memory Computing Graph ASIC Chip Featuring Wavefront Expansion and 2D Gradient Control 11.2 A CMOS Biosensor Array with 1024 3-Electrode Voltammetry Pixels and 93dB Dynamic Range 11.3 A Capacitive Biosensor for Cancer Diagnosis Using a Functionalized Microneedle and a 13.7b-Resolution Capacitance-to-Digital Converter from 1 to 100nF
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1