从合并框架到合并明星:使用HPX, Kokkos和SIMD类型的经验

2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2022-09-26 DOI:10.1109/ESPM256814.2022.00007

Gregor Daiß, Srinivas Yadav Singanaboina, Patrick Diehl, H. Kaiser, D. Pflüger

{"title":"从合并框架到合并明星:使用HPX, Kokkos和SIMD类型的经验","authors":"Gregor Daiß, Srinivas Yadav Singanaboina, Patrick Diehl, H. Kaiser, D. Pflüger","doi":"10.1109/ESPM256814.2022.00007","DOIUrl":null,"url":null,"abstract":"Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger’s Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger’s hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.","PeriodicalId":340754,"journal":{"name":"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types\",\"authors\":\"Gregor Daiß, Srinivas Yadav Singanaboina, Patrick Diehl, H. Kaiser, D. Pflüger\",\"doi\":\"10.1109/ESPM256814.2022.00007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger’s Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger’s hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.\",\"PeriodicalId\":340754,\"journal\":{\"name\":\"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESPM256814.2022.00007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESPM256814.2022.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

Octo-Tiger是一种用于星合并的大规模3D AMR代码，它结合了HPX、Kokkos和显式SIMD类型，旨在实现广泛的异构硬件的性能可移植性。然而，在A64FX cpu上，我们遇到了几个缺失的部分，导致SIMD矢量化问题，从而影响了性能。因此，我们添加了std::experimental::simd作为Octo-Tiger的Kokkos内核中与Kokkos simd一起使用的选项，并进一步添加了一个新的SVE(可伸缩向量扩展)simd后端。此外，我们还在Octo-Tiger的hydro求解器中修改了Kokkos内核中缺失的SIMD实现。我们通过在三个不同的CPU上运行Octo-Tiger来测试我们的变化:一个A64FX，一个Intel Icelake和一个AMD EPYC CPU，评估SIMD加速和节点级性能。我们在A64FX CPU上获得了很好的SIMD加速，在其他两个CPU平台上也有明显的加速。然而，我们在EPYC CPU上也遇到了缩放问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types

Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger’s Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger’s hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)

自引率

0.00%

发文量

期刊最新文献

A Selective Nesting Approach for the Sparse Multi-threaded Cholesky Factorization Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types