{"title":"基于OpenCL编程模型的fpga迭代模板算法综合框架(仅摘要)","authors":"Shuo Wang, Yun Liang","doi":"10.1145/3020078.3021761","DOIUrl":null,"url":null,"abstract":"Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between iterations and elements in the stencil algorithms, programming hurdle of FPGAs, and large design space. In this paper, we present a comprehensive framework that synthesizes iterative stencil algorithms on FPGAs efficiently. We leverage the OpenCL-to-FPGA tool chain to generate accelerator automatically and perform design space exploration at high level. We propose to bridge the neighboring tiles through pipe and enable data sharing among them to improve computation efficiency. We first propose a homogeneous design with equal tile size. Then, we extend to a heterogeneous design with different tile size to balance the computation among different tiles. Our designs exhibit a large design space in terms of tile structure. We also develop analytical performance models to explore the complex design space. Experiments using a wide range of stencil applications demonstrate that on average our homogeneous and heterogeneous implementations achieve 1.49X and 1.65X performance speedup respectively but with less hardware resource compared to the state-of-the-art.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only)\",\"authors\":\"Shuo Wang, Yun Liang\",\"doi\":\"10.1145/3020078.3021761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between iterations and elements in the stencil algorithms, programming hurdle of FPGAs, and large design space. In this paper, we present a comprehensive framework that synthesizes iterative stencil algorithms on FPGAs efficiently. We leverage the OpenCL-to-FPGA tool chain to generate accelerator automatically and perform design space exploration at high level. We propose to bridge the neighboring tiles through pipe and enable data sharing among them to improve computation efficiency. We first propose a homogeneous design with equal tile size. Then, we extend to a heterogeneous design with different tile size to balance the computation among different tiles. Our designs exhibit a large design space in terms of tile structure. We also develop analytical performance models to explore the complex design space. Experiments using a wide range of stencil applications demonstrate that on average our homogeneous and heterogeneous implementations achieve 1.49X and 1.65X performance speedup respectively but with less hardware resource compared to the state-of-the-art.\",\"PeriodicalId\":252039,\"journal\":{\"name\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3020078.3021761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (Abstract Only)
Iterative stencil algorithms find applications in a wide range of domains. FPGAs have long been adopted for computation acceleration due to its advantages of dedicated hardware design. Hence, FPGAs are a compelling alternative for executing iterative stencil algorithms. However, efficient implementation of iterative stencil algorithms on FPGAs is very challenging due to the data dependencies between iterations and elements in the stencil algorithms, programming hurdle of FPGAs, and large design space. In this paper, we present a comprehensive framework that synthesizes iterative stencil algorithms on FPGAs efficiently. We leverage the OpenCL-to-FPGA tool chain to generate accelerator automatically and perform design space exploration at high level. We propose to bridge the neighboring tiles through pipe and enable data sharing among them to improve computation efficiency. We first propose a homogeneous design with equal tile size. Then, we extend to a heterogeneous design with different tile size to balance the computation among different tiles. Our designs exhibit a large design space in terms of tile structure. We also develop analytical performance models to explore the complex design space. Experiments using a wide range of stencil applications demonstrate that on average our homogeneous and heterogeneous implementations achieve 1.49X and 1.65X performance speedup respectively but with less hardware resource compared to the state-of-the-art.