Naruya Kitai, D. Takahashi, F. Franchetti, T. Katagiri, S. Ohshima, Toru Nagai
{"title":"螺旋的A64可伸缩矢量扩展自适应调谐","authors":"Naruya Kitai, D. Takahashi, F. Franchetti, T. Katagiri, S. Ohshima, Toru Nagai","doi":"10.1109/IPDPSW52791.2021.00117","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an auto-tuning (AT) system by adapting the A64 Scalable Vector Extension for SPIRAL to generate discrete Fourier transform (DFT) implementations. The performance of our method is evaluated using the Supercomputer \"Flow\" at Nagoya University. The A64 scalable vector extension applied DFT codes are up to 1.98 times faster than scalar DFT codes and up to 3.63 times higher in terms of the SIMD instruction rate. In addition, we obtain a factor of maximum speedup 2.32 by adapting proposed AT system for loop unrolling.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL\",\"authors\":\"Naruya Kitai, D. Takahashi, F. Franchetti, T. Katagiri, S. Ohshima, Toru Nagai\",\"doi\":\"10.1109/IPDPSW52791.2021.00117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an auto-tuning (AT) system by adapting the A64 Scalable Vector Extension for SPIRAL to generate discrete Fourier transform (DFT) implementations. The performance of our method is evaluated using the Supercomputer \\\"Flow\\\" at Nagoya University. The A64 scalable vector extension applied DFT codes are up to 1.98 times faster than scalar DFT codes and up to 3.63 times higher in terms of the SIMD instruction rate. In addition, we obtain a factor of maximum speedup 2.32 by adapting proposed AT system for loop unrolling.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL
In this paper, we propose an auto-tuning (AT) system by adapting the A64 Scalable Vector Extension for SPIRAL to generate discrete Fourier transform (DFT) implementations. The performance of our method is evaluated using the Supercomputer "Flow" at Nagoya University. The A64 scalable vector extension applied DFT codes are up to 1.98 times faster than scalar DFT codes and up to 3.63 times higher in terms of the SIMD instruction rate. In addition, we obtain a factor of maximum speedup 2.32 by adapting proposed AT system for loop unrolling.