Efficient LUT-based FPGA Accelerator Design for Universal Quantized CNN Inference

Yanpeng Cao, Changjun Song, Yongming Tang
{"title":"Efficient LUT-based FPGA Accelerator Design for Universal Quantized CNN Inference","authors":"Yanpeng Cao, Changjun Song, Yongming Tang","doi":"10.1145/3456126.3456140","DOIUrl":null,"url":null,"abstract":"Deep learning has achieved remarkable success in a variety of tasks in real life, such as speech and vision. However, the vast computational complexity of convolution neural networks (CNN) has limited the speed of the network running in hardware. In recent years, network quantization technology has made it possible to quantize network into the 16-bit fixed point, 8-bit integer, and even binary, maintaining the original performance, while the computational complexity of the network inference is still considerable. Therefore, exploring high-performance and efficient hardware architecture designed for quantized neural networks (QNN) is necessary to eliminate the bottleneck of high-density computing requirements. FPGA is a highly parallelized hardware computing platform. The outstanding advantage is that it contains a large number of primary configurable logic resources. We explore the possibility of implementation for convolution calculations based on LUTs, introduce the integer multipliers and addition trees based on FPGAs, and propose an efficient computing architecture for QNN. With the optimization of Winograd convolution algorithm for QNN, we demonstrate that our scheme could significantly reduce the number of multipliers without using DSP resources, saving the usage of LUT resources by 2.25× at least. In the end, our LUT-based architecture for QNN will shorten the latency up to 19.3× and represent more effective performance compared other methods.","PeriodicalId":431685,"journal":{"name":"2021 2nd Asia Service Sciences and Software Engineering Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd Asia Service Sciences and Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3456126.3456140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep learning has achieved remarkable success in a variety of tasks in real life, such as speech and vision. However, the vast computational complexity of convolution neural networks (CNN) has limited the speed of the network running in hardware. In recent years, network quantization technology has made it possible to quantize network into the 16-bit fixed point, 8-bit integer, and even binary, maintaining the original performance, while the computational complexity of the network inference is still considerable. Therefore, exploring high-performance and efficient hardware architecture designed for quantized neural networks (QNN) is necessary to eliminate the bottleneck of high-density computing requirements. FPGA is a highly parallelized hardware computing platform. The outstanding advantage is that it contains a large number of primary configurable logic resources. We explore the possibility of implementation for convolution calculations based on LUTs, introduce the integer multipliers and addition trees based on FPGAs, and propose an efficient computing architecture for QNN. With the optimization of Winograd convolution algorithm for QNN, we demonstrate that our scheme could significantly reduce the number of multipliers without using DSP resources, saving the usage of LUT resources by 2.25× at least. In the end, our LUT-based architecture for QNN will shorten the latency up to 19.3× and represent more effective performance compared other methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于高效lut的通用量化CNN推理FPGA加速设计
深度学习在现实生活中的各种任务中取得了显著的成功,比如语音和视觉。然而,卷积神经网络(CNN)巨大的计算复杂度限制了网络在硬件上的运行速度。近年来,网络量化技术使得将网络量化为16位定点、8位整数甚至二进制成为可能,在保持原有性能的同时,网络推理的计算复杂度仍然相当大。因此,探索针对量化神经网络(QNN)设计的高性能、高效的硬件架构是消除高密度计算需求瓶颈的必要条件。FPGA是一种高度并行的硬件计算平台。突出的优点是它包含大量的主可配置逻辑资源。我们探索了基于lut实现卷积计算的可能性,引入了基于fpga的整数乘法器和加法树,并提出了一种高效的QNN计算架构。通过对QNN的Winograd卷积算法的优化,我们证明了我们的方案可以在不使用DSP资源的情况下显著减少乘法器的数量,节省LUT资源的使用至少2.25倍。最后,我们的基于lut的QNN架构将延迟缩短到19.3倍,并且与其他方法相比表现出更有效的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced Neural Architecture Search Using Super Learner and Ensemble Approaches Dynamic Weight of Adaptive Information Density Network for Image Super-Resolution Optimization Design and Application of Hybrid Simulation Skill Training Platform Oracle-based process automation in DLT dominated ecosystems with an application to German waterway transportation Improving business operation efficiency by using Smart Contract
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1