Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training

Mariko Tatsumi, Silviu-Ioan Filip, Caroline White, O. Sentieys, G. Lemieux
{"title":"Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training","authors":"Mariko Tatsumi, Silviu-Ioan Filip, Caroline White, O. Sentieys, G. Lemieux","doi":"10.1109/ICFPT56656.2022.9974324","DOIUrl":null,"url":null,"abstract":"The most compute-intensive stage of deep neural network (DNN) training is matrix multiplication where the multiply-accumulate (MAC) operator is key. To reduce training costs, we consider using low-precision arithmetic for MAC operations. While low-precision training has been investigated in prior work, the focus has been on reducing the number of bits in weights or activations without compromising accuracy. In contrast, the focus in this paper is on implementation details beyond weight or activation width that affect area and accuracy. In particular, we investigate the impact of fixed- versus floating-point representations, multiplier rounding, and floating-point exceptional value support. Results suggest that (1) low-precision floating-point is more area-effective than fixed-point for multiplication, (2) standard IEEE-754 rules for subnormals, NaNs, and intermediate rounding serve little to no value in terms of accuracy but contribute significantly to area, (3) low-precision MACs require an adaptive loss-scaling step during training to compensate for limited representation range, and (4) fixed-point is more area-effective for accumulation, but the cost of format conversion and downstream logic can swamp the savings. Finally, we note that future work should investigate accumulation structures beyond the MAC level to achieve further gains.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The most compute-intensive stage of deep neural network (DNN) training is matrix multiplication where the multiply-accumulate (MAC) operator is key. To reduce training costs, we consider using low-precision arithmetic for MAC operations. While low-precision training has been investigated in prior work, the focus has been on reducing the number of bits in weights or activations without compromising accuracy. In contrast, the focus in this paper is on implementation details beyond weight or activation width that affect area and accuracy. In particular, we investigate the impact of fixed- versus floating-point representations, multiplier rounding, and floating-point exceptional value support. Results suggest that (1) low-precision floating-point is more area-effective than fixed-point for multiplication, (2) standard IEEE-754 rules for subnormals, NaNs, and intermediate rounding serve little to no value in terms of accuracy but contribute significantly to area, (3) low-precision MACs require an adaptive loss-scaling step during training to compensate for limited representation range, and (4) fixed-point is more area-effective for accumulation, but the cost of format conversion and downstream logic can swamp the savings. Finally, we note that future work should investigate accumulation structures beyond the MAC level to achieve further gains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混合低精度的多重累加单元DNN训练
深度神经网络(DNN)训练中计算量最大的阶段是矩阵乘法,其中乘法累加算子(MAC)是关键。为了降低训练成本,我们考虑在MAC操作中使用低精度算法。虽然在之前的工作中已经研究了低精度训练,但重点是在不影响准确性的情况下减少权重或激活的比特数。相比之下,本文的重点是影响面积和精度的权重或激活宽度以外的实现细节。特别地,我们研究了固定与浮点表示、乘法器舍入和浮点异常值支持的影响。结果表明:(1)低精度浮点比定点对乘法的面积有效;(2)标准IEEE-754规则对次法线、nan和中间舍入的精度几乎没有价值,但对面积有显著贡献;(3)低精度mac在训练过程中需要自适应损失缩放步骤来补偿有限的表示范围;(4)定点对累积的面积有效。但是格式转换和下游逻辑的成本可能会淹没节省的成本。最后,我们注意到未来的工作应该研究MAC水平以外的积累结构,以获得进一步的收益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware-Efficient FPGA-Based Approximate Multipliers for Error-Tolerant Computing FPT 22 on Site Proceedings The Impact of Hardware Folding on Dependability in Spaceborne FPGA-based Neural Networks Cloning the Unclonable: Physically Cloning an FPGA Ring-Oscillator PUF A Markovian Approach for Detecting Failures in the Xilinx SEM core
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1