用于 FPGA 的神经网络加速优化方法

IF 0.9 4区数学 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Combinatorial Optimization Pub Date : 2024-06-25 DOI:10.1007/s10878-024-01117-x

Zhengwei Hu, Sijie Zhu, Leilei Wang, Wangbin Cao, Zhiyuan Xie

{"title":"用于 FPGA 的神经网络加速优化方法","authors":"Zhengwei Hu, Sijie Zhu, Leilei Wang, Wangbin Cao, Zhiyuan Xie","doi":"10.1007/s10878-024-01117-x","DOIUrl":null,"url":null,"abstract":"<p>A neural network accelerated optimization method for FPGA hardware platform is proposed. The method realizes the optimized deployment of neural network algorithms for FPGA hardware platforms from three aspects: computational speed, flexible transplantation, and development methods. Replacing multiplication based on Mitchell algorithm not only breaks through the speed bottleneck of neural network hardware acceleration caused by long multiplication period, but also makes the parallel acceleration of neural network algorithm get rid of the dependence on the number of hardware multipliers in FPGA, which can give full play to the advantages of FPGA parallel acceleration and maximize the computing speed. Based on the configurable strategy of neural network parameters, the number of network layers and nodes within layers can be adjusted according to different logical resource of FPGA, improving the flexibility of neural network transplantation. The adoption of HLS development method overcomes the shortcomings of RTL method in designing complex neural network algorithms, such as high difficulty in development and long development cycle. Using the Cyclone V SE 5CSEBA6U23I7 FPGA as the target device, a parameter configurable BP neural network was designed based on the proposed method. The usage of logical resources such as ALUT, Flip-Flop, RAM, and DSP were 39.6%, 40%, 56.9%, and 18.3% of the pre-optimized ones, respectively. The feasibility of the proposed method was verified using MNIST digital recognition and facial recognition as application scenarios. Compare to pre-optimization, the test time of MNIST number recognition is reduced to 67.58%, and the success rate was lost 0.195%. The test time for facial recognition applications was reduced to 69.571%, and the success rate of combining LDA algorithm was lost within 4%.</p>","PeriodicalId":50231,"journal":{"name":"Journal of Combinatorial Optimization","volume":"28 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A neural network accelerated optimization method for FPGA\",\"authors\":\"Zhengwei Hu, Sijie Zhu, Leilei Wang, Wangbin Cao, Zhiyuan Xie\",\"doi\":\"10.1007/s10878-024-01117-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A neural network accelerated optimization method for FPGA hardware platform is proposed. The method realizes the optimized deployment of neural network algorithms for FPGA hardware platforms from three aspects: computational speed, flexible transplantation, and development methods. Replacing multiplication based on Mitchell algorithm not only breaks through the speed bottleneck of neural network hardware acceleration caused by long multiplication period, but also makes the parallel acceleration of neural network algorithm get rid of the dependence on the number of hardware multipliers in FPGA, which can give full play to the advantages of FPGA parallel acceleration and maximize the computing speed. Based on the configurable strategy of neural network parameters, the number of network layers and nodes within layers can be adjusted according to different logical resource of FPGA, improving the flexibility of neural network transplantation. The adoption of HLS development method overcomes the shortcomings of RTL method in designing complex neural network algorithms, such as high difficulty in development and long development cycle. Using the Cyclone V SE 5CSEBA6U23I7 FPGA as the target device, a parameter configurable BP neural network was designed based on the proposed method. The usage of logical resources such as ALUT, Flip-Flop, RAM, and DSP were 39.6%, 40%, 56.9%, and 18.3% of the pre-optimized ones, respectively. The feasibility of the proposed method was verified using MNIST digital recognition and facial recognition as application scenarios. Compare to pre-optimization, the test time of MNIST number recognition is reduced to 67.58%, and the success rate was lost 0.195%. The test time for facial recognition applications was reduced to 69.571%, and the success rate of combining LDA algorithm was lost within 4%.</p>\",\"PeriodicalId\":50231,\"journal\":{\"name\":\"Journal of Combinatorial Optimization\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Combinatorial Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10878-024-01117-x\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Combinatorial Optimization","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10878-024-01117-x","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种针对 FPGA 硬件平台的神经网络加速优化方法。该方法从计算速度、灵活移植和开发方法三个方面实现了神经网络算法在FPGA硬件平台上的优化部署。基于 Mitchell 算法的乘法替换不仅突破了乘法周期长导致的神经网络硬件加速的速度瓶颈，而且使神经网络算法的并行加速摆脱了对 FPGA 硬件乘法器数量的依赖，可以充分发挥 FPGA 并行加速的优势，最大限度地提高计算速度。基于神经网络参数可配置策略，可根据FPGA的不同逻辑资源调整网络层数和层内节点数，提高了神经网络移植的灵活性。采用 HLS 开发方法克服了 RTL 方法在设计复杂神经网络算法时开发难度高、开发周期长等缺点。以 Cyclone V SE 5CSEBA6U23I7 FPGA 为目标器件，基于所提出的方法设计了一个参数可配置的 BP 神经网络。ALUT、触发器、RAM 和 DSP 等逻辑资源的使用率分别为预优化的 39.6%、40%、56.9% 和 18.3%。以 MNIST 数字识别和面部识别为应用场景，验证了所提方法的可行性。与优化前相比，MNIST 数字识别的测试时间缩短了 67.58%，成功率降低了 0.195%。人脸识别应用的测试时间缩短到 69.571%，结合 LDA 算法的成功率损失在 4% 以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A neural network accelerated optimization method for FPGA

A neural network accelerated optimization method for FPGA hardware platform is proposed. The method realizes the optimized deployment of neural network algorithms for FPGA hardware platforms from three aspects: computational speed, flexible transplantation, and development methods. Replacing multiplication based on Mitchell algorithm not only breaks through the speed bottleneck of neural network hardware acceleration caused by long multiplication period, but also makes the parallel acceleration of neural network algorithm get rid of the dependence on the number of hardware multipliers in FPGA, which can give full play to the advantages of FPGA parallel acceleration and maximize the computing speed. Based on the configurable strategy of neural network parameters, the number of network layers and nodes within layers can be adjusted according to different logical resource of FPGA, improving the flexibility of neural network transplantation. The adoption of HLS development method overcomes the shortcomings of RTL method in designing complex neural network algorithms, such as high difficulty in development and long development cycle. Using the Cyclone V SE 5CSEBA6U23I7 FPGA as the target device, a parameter configurable BP neural network was designed based on the proposed method. The usage of logical resources such as ALUT, Flip-Flop, RAM, and DSP were 39.6%, 40%, 56.9%, and 18.3% of the pre-optimized ones, respectively. The feasibility of the proposed method was verified using MNIST digital recognition and facial recognition as application scenarios. Compare to pre-optimization, the test time of MNIST number recognition is reduced to 67.58%, and the success rate was lost 0.195%. The test time for facial recognition applications was reduced to 69.571%, and the success rate of combining LDA algorithm was lost within 4%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Combinatorial Optimization 数学-计算机：跨学科应用

CiteScore

2.00

自引率

10.00%

发文量

审稿时长

6 months

期刊介绍： The objective of Journal of Combinatorial Optimization is to advance and promote the theory and applications of combinatorial optimization, which is an area of research at the intersection of applied mathematics, computer science, and operations research and which overlaps with many other areas such as computation complexity, computational biology, VLSI design, communication networks, and management science. It includes complexity analysis and algorithm design for combinatorial optimization problems, numerical experiments and problem discovery with applications in science and engineering. The Journal of Combinatorial Optimization publishes refereed papers dealing with all theoretical, computational and applied aspects of combinatorial optimization. It also publishes reviews of appropriate books and special issues of journals.