H. Nakashima, Yoshiki Summura, Keisuke Kikura, Y. Miyake
{"title":"基于高效粒子分割的大规模多核感知PIC仿真","authors":"H. Nakashima, Yoshiki Summura, Keisuke Kikura, Y. Miyake","doi":"10.1109/IPDPS.2017.65","DOIUrl":null,"url":null,"abstract":"We are now developing a manycore-aware implementation of multiprocessed PIC (particle-in-cell) simulation code with automatic load balancing. A key issue of the implementation is how to exploit the wide SIMD mechanism of manycore processors such as Intel Xeon Phi. Our solution is \"particle binning\" to rank all particles in a cell (voxel) in a chunk of SOA (structure-of-arrays) type one-dimensional arrays so that \"particle-push\" and \"current-scatter\" operations on them are efficiently SIMD-vectorized by our compiler. In addition, our sophisticated binning mechanism performs sorting of particles according to their positions \"on-the-fly\", efficiently coping with occasional \"bin overflow\" in a fully multithreaded manner. Our performance evaluation with up to 64 nodes of Cray XC30 and XC40 supercomputers, equipped with Xeon Phi 5120D (Knights Corner) and 7250 (Knights Landing) respectively, not only exhibited good parallel performance, but also proved the effectiveness of our binning mechanism.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning\",\"authors\":\"H. Nakashima, Yoshiki Summura, Keisuke Kikura, Y. Miyake\",\"doi\":\"10.1109/IPDPS.2017.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are now developing a manycore-aware implementation of multiprocessed PIC (particle-in-cell) simulation code with automatic load balancing. A key issue of the implementation is how to exploit the wide SIMD mechanism of manycore processors such as Intel Xeon Phi. Our solution is \\\"particle binning\\\" to rank all particles in a cell (voxel) in a chunk of SOA (structure-of-arrays) type one-dimensional arrays so that \\\"particle-push\\\" and \\\"current-scatter\\\" operations on them are efficiently SIMD-vectorized by our compiler. In addition, our sophisticated binning mechanism performs sorting of particles according to their positions \\\"on-the-fly\\\", efficiently coping with occasional \\\"bin overflow\\\" in a fully multithreaded manner. Our performance evaluation with up to 64 nodes of Cray XC30 and XC40 supercomputers, equipped with Xeon Phi 5120D (Knights Corner) and 7250 (Knights Landing) respectively, not only exhibited good parallel performance, but also proved the effectiveness of our binning mechanism.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning
We are now developing a manycore-aware implementation of multiprocessed PIC (particle-in-cell) simulation code with automatic load balancing. A key issue of the implementation is how to exploit the wide SIMD mechanism of manycore processors such as Intel Xeon Phi. Our solution is "particle binning" to rank all particles in a cell (voxel) in a chunk of SOA (structure-of-arrays) type one-dimensional arrays so that "particle-push" and "current-scatter" operations on them are efficiently SIMD-vectorized by our compiler. In addition, our sophisticated binning mechanism performs sorting of particles according to their positions "on-the-fly", efficiently coping with occasional "bin overflow" in a fully multithreaded manner. Our performance evaluation with up to 64 nodes of Cray XC30 and XC40 supercomputers, equipped with Xeon Phi 5120D (Knights Corner) and 7250 (Knights Landing) respectively, not only exhibited good parallel performance, but also proved the effectiveness of our binning mechanism.