{"title":"矢量化复算法的混合数据布局核","authors":"Doru-Thom Popovici, F. Franchetti, Tze Meng Low","doi":"10.1109/HPEC.2017.8091024","DOIUrl":null,"url":null,"abstract":"Implementing complex arithmetic routines with Single Instruction Multiple Data (SIMD) instructions requires the use of instructions that are usually not found in their real arithmetic counter-parts. These instructions, such as shuffles and addsub, are often bottlenecks for many complex arithmetic kernels as modern architectures usually can perform more real arithmetic operations than execute instructions for complex arithmetic. In this work, we focus on using a variety of data layouts (mixed format) for storing complex numbers at different stages of the computation so as to limit the use of these instructions. Using complex matrix multiplication and Fast Fourier Transforms (FFTs) as our examples, we demonstrate that performance improvements of up to 2× can be attained with mixed format within the computational routines. We also described how existing algorithms can be easily modified to implement the mixed format complex layout.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Mixed data layout kernels for vectorized complex arithmetic\",\"authors\":\"Doru-Thom Popovici, F. Franchetti, Tze Meng Low\",\"doi\":\"10.1109/HPEC.2017.8091024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementing complex arithmetic routines with Single Instruction Multiple Data (SIMD) instructions requires the use of instructions that are usually not found in their real arithmetic counter-parts. These instructions, such as shuffles and addsub, are often bottlenecks for many complex arithmetic kernels as modern architectures usually can perform more real arithmetic operations than execute instructions for complex arithmetic. In this work, we focus on using a variety of data layouts (mixed format) for storing complex numbers at different stages of the computation so as to limit the use of these instructions. Using complex matrix multiplication and Fast Fourier Transforms (FFTs) as our examples, we demonstrate that performance improvements of up to 2× can be attained with mixed format within the computational routines. We also described how existing algorithms can be easily modified to implement the mixed format complex layout.\",\"PeriodicalId\":364903,\"journal\":{\"name\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2017.8091024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2017.8091024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mixed data layout kernels for vectorized complex arithmetic
Implementing complex arithmetic routines with Single Instruction Multiple Data (SIMD) instructions requires the use of instructions that are usually not found in their real arithmetic counter-parts. These instructions, such as shuffles and addsub, are often bottlenecks for many complex arithmetic kernels as modern architectures usually can perform more real arithmetic operations than execute instructions for complex arithmetic. In this work, we focus on using a variety of data layouts (mixed format) for storing complex numbers at different stages of the computation so as to limit the use of these instructions. Using complex matrix multiplication and Fast Fourier Transforms (FFTs) as our examples, we demonstrate that performance improvements of up to 2× can be attained with mixed format within the computational routines. We also described how existing algorithms can be easily modified to implement the mixed format complex layout.