{"title":"基于Intel MIC架构的并行三维确定性粒子输运","authors":"Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang","doi":"10.1109/HPCSim.2014.6903685","DOIUrl":null,"url":null,"abstract":"Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"46 1","pages":"186-192"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Parallel 3D deterministic particle transport on Intel MIC architecture\",\"authors\":\"Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang\",\"doi\":\"10.1109/HPCSim.2014.6903685\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.\",\"PeriodicalId\":6469,\"journal\":{\"name\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"46 1\",\"pages\":\"186-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2014.6903685\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2014.6903685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
在粒子输运问题的大规模并行解中,单节点计算速度至关重要。Intel多集成核心(MIC)架构支持200多个硬件线程以及512位双精度浮点向量操作。在本文中,我们使用MIC的原生模型来并行化模拟三维笛卡尔几何中一能量群时间无关的确定性离散坐标粒子输运(Sweep3D)。该实现在保证数据局部性的前提下,同时采用了硬件线程和MIC中的矢量单元,有效地利用了离散坐标法中的多级并行性。我们的优化实现在目标MIC上进行了验证,当通量修复关闭时,基于Intel Xeon E5-2660 CPU上的原始MPI代码,可以提供高达1.99倍的加速。与之前的NVIDIA Tesla M2050 GPU相比,获得了高达1.23倍的加速。此外,还讨论了在MIC和GPU上实现的不同之处。
Parallel 3D deterministic particle transport on Intel MIC architecture
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.