基于Intel MIC架构的并行三维确定性粒子输运

Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang
{"title":"基于Intel MIC架构的并行三维确定性粒子输运","authors":"Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang","doi":"10.1109/HPCSim.2014.6903685","DOIUrl":null,"url":null,"abstract":"Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"46 1","pages":"186-192"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Parallel 3D deterministic particle transport on Intel MIC architecture\",\"authors\":\"Qinglin Wang, Zuocheng Xing, Jie Liu, X. Qiang, Chunye Gong, Jiang Jiang\",\"doi\":\"10.1109/HPCSim.2014.6903685\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.\",\"PeriodicalId\":6469,\"journal\":{\"name\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"46 1\",\"pages\":\"186-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2014.6903685\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2014.6903685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在粒子输运问题的大规模并行解中,单节点计算速度至关重要。Intel多集成核心(MIC)架构支持200多个硬件线程以及512位双精度浮点向量操作。在本文中,我们使用MIC的原生模型来并行化模拟三维笛卡尔几何中一能量群时间无关的确定性离散坐标粒子输运(Sweep3D)。该实现在保证数据局部性的前提下,同时采用了硬件线程和MIC中的矢量单元,有效地利用了离散坐标法中的多级并行性。我们的优化实现在目标MIC上进行了验证,当通量修复关闭时,基于Intel Xeon E5-2660 CPU上的原始MPI代码,可以提供高达1.99倍的加速。与之前的NVIDIA Tesla M2050 GPU相比,获得了高达1.23倍的加速。此外,还讨论了在MIC和GPU上实现的不同之处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Parallel 3D deterministic particle transport on Intel MIC architecture
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The implementation adopts both hardware threads and vector units in MIC to efficiently exploit multi-level parallelism in the discrete ordinates method when keeping good data locality. Our optimized implementation is verified on target MIC and can provide up to 1.99 times speedup based on the original MPI code on Intel Xeon E5-2660 CPU when flux fixup is off. Compared with the prior on NVIDIA Tesla M2050 GPU, the speedup of up to 1.23 times is obtained. In addition, the difference between the implementations on MIC and GPU is discussed as well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AI4IO: A Suite of Ai-Based Tools for IO-Aware HPC Resource Management Improving Efficiency and Performance Through Faster Scheduling Mechanisms Towards an Integral System for Processing Big Graphs at Scale Advances in High Performance Computing - Results of the International Conference on "High Performance Computing", HPC 2019, Borovets, Bulgaria, September 2-6, 2019 Role of HPC in next-generation AI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1