{"title":"RxRE: Throughput Optimization for High-Level Synthesis using Resource-Aware Regularity Extraction (Abstract Only)","authors":"A. Lotfi, Rajesh K. Gupta","doi":"10.1145/3020078.3021797","DOIUrl":null,"url":null,"abstract":"Despite the considerable improvements in the quality of HLS tools, they still require the designer's manual optimizations and tweaks to generate efficient results, which negates the HLS design productivity gains. Majority of designer interventions lead to optimizations that are often global in nature, for instance, finding patterns in functions that better fit a custom designed solution. We introduce a high-level resource-aware regularity extraction workflow, called RxRE that detects a class of patterns in an input program, and enhances resource sharing to balance resource usage against increased throughput. RxRE automatically detects structural patterns, or repeated sequence of floating-point operations, from sequential loops, selects suitable resources for them, and shares resources for all instances of the selected patterns. RxRE reduces required hardware area for synthesizing an instance of the program. Hence, more number of program replicas can be fitted in the fixed area budget of an FPGA. RxRE contributes to a pre-synthesis workflow that exploits the inherent regularity of applications to achieve higher computational throughput using off-the-shelf HLS tools without any changes to the HLS flow. It uses a string-based pattern detection approach to find linear patterns across loops within the same function. It deploys a simple but effective model to estimate resource utilization and latency of each candidate design, to avoid synthesizing every possible design alternative. We have implemented and evaluated RxRE using a set of C benchmarks. The synthesis results on a Xilinx Virtex FPGA show that the reduced area of the transformed programs improves the number of mapped kernels by a factor of 1.54X on average (maximum 2.8X) which yields on average 1.59X (maximum 2.4X) higher throughput over Xilinx Vivado HLS tool solution. Current implementation has several limitations and only extracts a special case of regularity that is subject of current optimization and study.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the considerable improvements in the quality of HLS tools, they still require the designer's manual optimizations and tweaks to generate efficient results, which negates the HLS design productivity gains. Majority of designer interventions lead to optimizations that are often global in nature, for instance, finding patterns in functions that better fit a custom designed solution. We introduce a high-level resource-aware regularity extraction workflow, called RxRE that detects a class of patterns in an input program, and enhances resource sharing to balance resource usage against increased throughput. RxRE automatically detects structural patterns, or repeated sequence of floating-point operations, from sequential loops, selects suitable resources for them, and shares resources for all instances of the selected patterns. RxRE reduces required hardware area for synthesizing an instance of the program. Hence, more number of program replicas can be fitted in the fixed area budget of an FPGA. RxRE contributes to a pre-synthesis workflow that exploits the inherent regularity of applications to achieve higher computational throughput using off-the-shelf HLS tools without any changes to the HLS flow. It uses a string-based pattern detection approach to find linear patterns across loops within the same function. It deploys a simple but effective model to estimate resource utilization and latency of each candidate design, to avoid synthesizing every possible design alternative. We have implemented and evaluated RxRE using a set of C benchmarks. The synthesis results on a Xilinx Virtex FPGA show that the reduced area of the transformed programs improves the number of mapped kernels by a factor of 1.54X on average (maximum 2.8X) which yields on average 1.59X (maximum 2.4X) higher throughput over Xilinx Vivado HLS tool solution. Current implementation has several limitations and only extracts a special case of regularity that is subject of current optimization and study.