Gustavo Leite, A. Baldassin, G. Araújo, J. N. Amaral
{"title":"Performance Evaluation of Compiler Optimizations in FPGA Accelerators","authors":"Gustavo Leite, A. Baldassin, G. Araújo, J. N. Amaral","doi":"10.5753/wscad.2019.8681","DOIUrl":null,"url":null,"abstract":"With the increasing power wall in microprocessor design, engineers shifted their attention to heterogeneous architectures, wherein several classes of devices are used for computation. Among them are FPGAs which offer comparable performance to CPUs while consuming only a fraction of energy. Despite the increasing interest in these devices, programmability and performance engineering in FPGAs remain hard. This work presents an evaluation of the most prominent code transformations targeting FPGAs. More specifically, it studies the performance effect of unrolling loops, replicating compute units and transferring data using DMA in a matrix multiplication OpenCL kernel through an Intel® FPGA. The results indicate that these optimizations can achieve speedups up to 3.78× for a matrix multiplication application, and 412.5× speedup in data transfer.","PeriodicalId":117711,"journal":{"name":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/wscad.2019.8681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing power wall in microprocessor design, engineers shifted their attention to heterogeneous architectures, wherein several classes of devices are used for computation. Among them are FPGAs which offer comparable performance to CPUs while consuming only a fraction of energy. Despite the increasing interest in these devices, programmability and performance engineering in FPGAs remain hard. This work presents an evaluation of the most prominent code transformations targeting FPGAs. More specifically, it studies the performance effect of unrolling loops, replicating compute units and transferring data using DMA in a matrix multiplication OpenCL kernel through an Intel® FPGA. The results indicate that these optimizations can achieve speedups up to 3.78× for a matrix multiplication application, and 412.5× speedup in data transfer.