A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating

Song Zhang, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei
{"title":"A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating","authors":"Song Zhang, Jiangyuan Gu, S. Yin, Leibo Liu, Shaojun Wei","doi":"10.1145/3394885.3431531","DOIUrl":null,"url":null,"abstract":"Multiply and accumulations(MAC) are fundamental operations for domain-specific accelerator with AI applications ranging from filtering to convolutional neural networks(CNN). This paper proposes an energy-efficient MAC design, supporting a wide range of bit- width, for both signed and unsigned operands. Firstly, based on the classic Booth algorithm, we propose the Booth algorithm to propose a multiply-add merged strategy. The design can not only support both signed and unsigned operations but also eliminate the delay, area and power overheads from the adder of traditional MAC units. Then a multiply-add merged design method for flexible bit-width adjustment is proposed using the fusion strategy. In addition, treating the addend as a partial product makes the operation easy to pipeline and balanced. The comprehensive improvement in delay, area and power can meet various requirements from different applications and hardware design. By using the proposed method, we have synthesized MAC units for several operation modes using a SMIC 40-nm library. Comparison with other MAC designs shows that the proposed design method can achieve up to 24.1% and 28.2% PDP and ADP improvement for bit-width fixed MAC designs, and 28.43% ~ 38.16% for bit-width adjustable ones. When pipelined, the design has decreased the latency by more than 13%. The improvement in power and area is up to 8.0% and 8.1% respectively.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Multiply and accumulations(MAC) are fundamental operations for domain-specific accelerator with AI applications ranging from filtering to convolutional neural networks(CNN). This paper proposes an energy-efficient MAC design, supporting a wide range of bit- width, for both signed and unsigned operands. Firstly, based on the classic Booth algorithm, we propose the Booth algorithm to propose a multiply-add merged strategy. The design can not only support both signed and unsigned operations but also eliminate the delay, area and power overheads from the adder of traditional MAC units. Then a multiply-add merged design method for flexible bit-width adjustment is proposed using the fusion strategy. In addition, treating the addend as a partial product makes the operation easy to pipeline and balanced. The comprehensive improvement in delay, area and power can meet various requirements from different applications and hardware design. By using the proposed method, we have synthesized MAC units for several operation modes using a SMIC 40-nm library. Comparison with other MAC designs shows that the proposed design method can achieve up to 24.1% and 28.2% PDP and ADP improvement for bit-width fixed MAC designs, and 28.43% ~ 38.16% for bit-width adjustable ones. When pipelined, the design has decreased the latency by more than 13%. The improvement in power and area is up to 8.0% and 8.1% respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于乘加合并策略的人工智能加速多精度乘累加设计
乘法和累积(MAC)是特定领域加速器的基本操作,具有从过滤到卷积神经网络(CNN)的人工智能应用。本文提出了一种节能的MAC设计,支持大范围的位宽,适用于有符号和无符号操作数。首先,在经典Booth算法的基础上,提出Booth算法,提出一种乘加合并策略。该设计不仅支持有符号和无符号操作,而且消除了传统MAC单元加法器的延迟、面积和功耗开销。然后利用融合策略提出了一种灵活位宽调整的乘加合并设计方法。此外,将加数作为部分乘积处理,使操作易于流水线化和平衡。在时延、面积、功耗等方面的全面提升,可以满足不同应用和硬件设计的各种需求。利用所提出的方法,我们利用SMIC 40纳米库合成了几种工作模式的MAC单元。与其他MAC设计的比较表明,该设计方法对位宽固定MAC设计的PDP和ADP提高了24.1%和28.2%,对位宽可调MAC设计的PDP和ADP提高了28.43% ~ 38.16%。当采用流水线时,该设计将延迟降低了13%以上。功率和面积分别提高了8.0%和8.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System Value-Aware Error Detection and Correction for SRAM Buffers in Low-Bitwidth, Floating-Point CNN Accelerators A Unified Printed Circuit Board Routing Algorithm With Complicated Constraints and Differential Pairs Uncertainty Modeling of Emerging Device based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search A DSM-based Polar Transmitter with 23.8% System Efficiency
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1