Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

H. Zhang, Hyuk-Jae Lee, S. Ko
{"title":"Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors","authors":"H. Zhang, Hyuk-Jae Lee, S. Ko","doi":"10.1109/ISCAS.2018.8351354","DOIUrl":null,"url":null,"abstract":"Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"62 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS.2018.8351354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于深度学习处理器的高效固定/浮点合并混合精度乘累加单元
近年来,深度学习越来越受到人们的关注。为了有效地实现深度神经网络,已经提出了许多硬件架构。算术单元作为硬件体系结构的核心处理部分,可以决定整个体系结构的功能。本文提出了一种高效的用于深度学习处理器的固定/浮点合并乘累加单元。该架构支持16位半精度浮点乘法和32位单精度累加,用于深度学习算法的训练操作。此外,在相同的硬件内,所提出的架构还支持两个并行的8位定点乘法,并将乘积累加为32位定点数。这将为深度学习算法的推理操作提供更高的吞吐量。与半精度乘-累加单元(累加到单精度)相比,所提出的架构只有4.6%的面积开销。利用所提出的乘法累积单元,深度学习处理器可以同时支持训练和高吞吐量推理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ultra-Low Power Wide-Dynamic-Range Universal Interface for Capacitive and Resistive Sensors An Energy-Efficient 13-bit Zero-Crossing ΔΣ Capacitance-to-Digital Converter with 1 pF-to-10 nF Sensing Range Power Optimized Comparator Selecting Method For Stochastic ADC Brain-inspired recurrent neural network with plastic RRAM synapses On the Use of Approximate Multipliers in LMS Adaptive Filters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1