fpga上高效推理的低精度网络

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI:10.1109/ICFPT52863.2021.9609837

R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson

{"title":"fpga上高效推理的低精度网络","authors":"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson","doi":"10.1109/ICFPT52863.2021.9609837","DOIUrl":null,"url":null,"abstract":"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Low Precision Networks for Efficient Inference on FPGAs\",\"authors\":\"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson\",\"doi\":\"10.1109/ICFPT52863.2021.9609837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.\",\"PeriodicalId\":376220,\"journal\":{\"name\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT52863.2021.9609837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

块浮点(BFP)是一种结合了高动态范围和低成本推理的量化方法。BFP可以在FPGA硬件上高效实现，并且在低精度下，与阻塞的FP16相比，在保持精度的同时减少了一半的逻辑占用。移动到非常低的精度可以再次减少一半的逻辑占用，并且重新训练可以恢复转换中丢失的任何精度。本文描述了我们在低精度端到端人工智能解决方案中实现目标精度和FPGA资源使用的方法。我们继续用我们的软件模型来研究再训练的效果，该模型在FPGA上复制了BFP的低级实现。我们的解决方案允许对自定义网络的量化进行功效测试，并为最终应用提供准确性指示和资源使用情况。使用我们的解决方案，我们能够将ResNet 50, SSD300和UNet量化到int5/4bfp精度，同时减少FPGA资源并提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Low Precision Networks for Efficient Inference on FPGAs

Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量