{"title":"用矢量特性和OpenMP实现基于winograd的可移植卷积","authors":"M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí","doi":"10.1109/pdp55904.2022.00015","DOIUrl":null,"url":null,"abstract":"We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP\",\"authors\":\"M. F. Dolz, Adrián Castelló, E. S. Quintana‐Ortí\",\"doi\":\"10.1109/pdp55904.2022.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.\",\"PeriodicalId\":210759,\"journal\":{\"name\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/pdp55904.2022.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP
We take a step forward in the direction of developing high performance codes for the convolution, based on the Winograd transformation, that are easy to customize for different processor architectures. In our approach, augmenting the portability of the solution is achieved via the introduction of vector intrinsics to exploit the SIMD (single-instruction multiple-data) capabilities of current processors as well as OpenMP pragmas to exploit multi-thread parallelism. While this comes at the cost of sacrificing a fraction of the computational performance, our experimental results on two distinct processors, with Intel Xeon Skylake and ARM Cortex A57 architectures, show that the impact is affordable, and still renders a Winograd-based solution that is competitive with the general method for the convolution based on the so-called im2col transform followed by a matrix-matrix multiplication.