R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson
{"title":"fpga上高效推理的低精度网络","authors":"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson","doi":"10.1109/ICFPT52863.2021.9609837","DOIUrl":null,"url":null,"abstract":"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Low Precision Networks for Efficient Inference on FPGAs\",\"authors\":\"R. Abra, Dmitry Denisenko, Richard Allen, Tim Vanderhoek, Sarah Wolstencroft, Peter M. Gibson\",\"doi\":\"10.1109/ICFPT52863.2021.9609837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.\",\"PeriodicalId\":376220,\"journal\":{\"name\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT52863.2021.9609837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low Precision Networks for Efficient Inference on FPGAs
Block Floating Point (BFP) is a type of quantization that combines high dynamic range with low-cost inference. BFP can be implemented efficiently on FPGA hardware and, at low precision, halves the logic footprint versus blocked FP16 while maintaining accuracy. Moving to very low precision halves the logic footprint again and retraining allows the recovery of any accuracy lost in transition. This paper describes our approach to achieving target accuracy and FPGA resource usage in a low-precision end-to-end AI solution. We go on to investigate the effects of retraining with our software model that replicates the low-level implementation of BFP on FPGA. Our solution allows efficacy testing for the quantization of custom networks and provides accuracy indications and resource usage for the final application. Using our solution, we were able to quantize ResNet 50, SSD300 and UNet to int5/4bfp precision without losing accuracy while reducing FPGA resources and improving performance.