Pengfei Song, Jeng-Shyang Pan, Chun-Sheng Yang, Chiou-Yng Lee
{"title":"An efficient FPGA-based accelerator design for convolution","authors":"Pengfei Song, Jeng-Shyang Pan, Chun-Sheng Yang, Chiou-Yng Lee","doi":"10.1109/ICAWST.2017.8256507","DOIUrl":null,"url":null,"abstract":"Number theoretic transform with the modular arithmetic operations can perform convolution efficiently in a ring without round-off errors. In this paper, a new efficient architecture of the transform have been proposed which support a various operand size. To have a balanced trade-off between area and latency, a variant constant geometry architecture is used which the forward and backward sub-stage used the same computation pattern. In addition, a XOR-based multi-ported RAM is adopted to accelerate the memory access which allow multiple simultaneous reads and writes efficiently. As a result, the developed accelerator can achieve lower area-latency FPGA compared to other designs.","PeriodicalId":378618,"journal":{"name":"2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAWST.2017.8256507","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Number theoretic transform with the modular arithmetic operations can perform convolution efficiently in a ring without round-off errors. In this paper, a new efficient architecture of the transform have been proposed which support a various operand size. To have a balanced trade-off between area and latency, a variant constant geometry architecture is used which the forward and backward sub-stage used the same computation pattern. In addition, a XOR-based multi-ported RAM is adopted to accelerate the memory access which allow multiple simultaneous reads and writes efficiently. As a result, the developed accelerator can achieve lower area-latency FPGA compared to other designs.