P. Klages, K. Bandura, N. Denman, A. Recnik, J. Sievers, K. Vanderlinde
{"title":"用于高速4位天体物理数据处理的GPU内核","authors":"P. Klages, K. Bandura, N. Denman, A. Recnik, J. Sievers, K. Vanderlinde","doi":"10.1109/ASAP.2015.7245729","DOIUrl":null,"url":null,"abstract":"Interferometric radio telescopes often rely on computationally expensive O(N2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit-depth of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With these kernels, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. By design, these kernels allow correlations to scale to large numbers of input elements, and are limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"89 1","pages":"164-165"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"GPU kernels for high-speed 4-bit astrophysical data processing\",\"authors\":\"P. Klages, K. Bandura, N. Denman, A. Recnik, J. Sievers, K. Vanderlinde\",\"doi\":\"10.1109/ASAP.2015.7245729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interferometric radio telescopes often rely on computationally expensive O(N2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit-depth of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With these kernels, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. By design, these kernels allow correlations to scale to large numbers of input elements, and are limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.\",\"PeriodicalId\":6642,\"journal\":{\"name\":\"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"89 1\",\"pages\":\"164-165\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.2015.7245729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2015.7245729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GPU kernels for high-speed 4-bit astrophysical data processing
Interferometric radio telescopes often rely on computationally expensive O(N2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit-depth of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With these kernels, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. By design, these kernels allow correlations to scale to large numbers of input elements, and are limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.