Yuan Dai, Simin Liu, Yao Lu, Hao Zhou, Seyedramin Rasoulinezhad, Philip H. W. Leong, Lingli Wang
{"title":"APIR-DSP: An approximate PIR-DSP architecture for error-tolerant applications","authors":"Yuan Dai, Simin Liu, Yao Lu, Hao Zhou, Seyedramin Rasoulinezhad, Philip H. W. Leong, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609927","DOIUrl":null,"url":null,"abstract":"In error-tolerant applications such as low-precision DNNs and digital filters, approximate arithmetic circuits can significantly reduce hardware resource utilization. In this work we propose an embedded block for field-programmable gate arrays, called APIR-DSP, which incorporates an approximate 9×9 hard multiplier based on the PIR-DSP architecture to improve speed and reduce area. In addition, a DSP unit evaluation platform based on Yosys and VPR which packs multiply accumulate operations into DSP blocks is developed. Using this tool we synthesis designs from Verilog implementations of matrix multiplication in DeepBench and the DoReFaNet low-precision neural network and show that APIR-DSP significantly reduces DSP resources and improves hardware utilization and performance compared with the Xilinx DSP48E2 embedded block. Compared with exact multiplication, it is shown that accuracy loss is optimized with the SNR of an FIR filter being reduced by 1.03 dB. For DNNs, accuracy loss for AlexNet is 0.31% on CIFAR10 dataset and no accuracy loss for LeNet on MNIST dataset is observed. Synthesis results show that the APIR-DSP enjoys an area reduction of 21.60%, critical path reduction of 4.85% and power consumption is reduced by 2.80%, compared with PIR-DSP.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In error-tolerant applications such as low-precision DNNs and digital filters, approximate arithmetic circuits can significantly reduce hardware resource utilization. In this work we propose an embedded block for field-programmable gate arrays, called APIR-DSP, which incorporates an approximate 9×9 hard multiplier based on the PIR-DSP architecture to improve speed and reduce area. In addition, a DSP unit evaluation platform based on Yosys and VPR which packs multiply accumulate operations into DSP blocks is developed. Using this tool we synthesis designs from Verilog implementations of matrix multiplication in DeepBench and the DoReFaNet low-precision neural network and show that APIR-DSP significantly reduces DSP resources and improves hardware utilization and performance compared with the Xilinx DSP48E2 embedded block. Compared with exact multiplication, it is shown that accuracy loss is optimized with the SNR of an FIR filter being reduced by 1.03 dB. For DNNs, accuracy loss for AlexNet is 0.31% on CIFAR10 dataset and no accuracy loss for LeNet on MNIST dataset is observed. Synthesis results show that the APIR-DSP enjoys an area reduction of 21.60%, critical path reduction of 4.85% and power consumption is reduced by 2.80%, compared with PIR-DSP.