{"title":"Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library","authors":"V. Pascuzzi, M. Goli","doi":"10.1145/3529538.3529996","DOIUrl":null,"url":null,"abstract":"In this paper, we present an early version of a SYCL-based FFT library, capable of running on all major vendor hardware, including CPUs and GPUs from AMD, ARM, Intel and NVIDIA. The current limitations of our library is it supports single-dimension FFTs up to 211 in length and base-2 input sequences. Although preliminary, the aim of this work is to seed further developments for a rich set of features for calculating FFTs. The library has the advantage over existing portable FFT libraries in that it is single-source, and therefore removes the complexities that arise due to abundant use of pre-processor macros and auto-generated kernels to target different architectures. We exercise two SYCL-enabled compilers, Codeplay ComputeCpp and Intel’s open-source LLVM project, to evaluate performance portability of our SYCL-based FFT on various heterogeneous architectures.We provide studies comparing our portable library with highly optimized vendor-specific FFT libraries, and discuss potential sources hindering performance.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all major vendor hardware, including CPUs and GPUs from AMD, ARM, Intel and NVIDIA. The current limitations of our library is it supports single-dimension FFTs up to 211 in length and base-2 input sequences. Although preliminary, the aim of this work is to seed further developments for a rich set of features for calculating FFTs. The library has the advantage over existing portable FFT libraries in that it is single-source, and therefore removes the complexities that arise due to abundant use of pre-processor macros and auto-generated kernels to target different architectures. We exercise two SYCL-enabled compilers, Codeplay ComputeCpp and Intel’s open-source LLVM project, to evaluate performance portability of our SYCL-based FFT on various heterogeneous architectures.We provide studies comparing our portable library with highly optimized vendor-specific FFT libraries, and discuss potential sources hindering performance.