{"title":"Optimizing a particle-in-cell code on Intel knights landing","authors":"Minhua Wen, Min Chen, James Lin","doi":"10.1145/3176364.3176376","DOIUrl":null,"url":null,"abstract":"The particle-in-cell (PIC) code is one of the mainstream algorithms in the laser plasma research area. However, the programming challenges to achieve high performance of PIC codes on the Intel Knights Landing (KNL) processor is widely concerned by global laser plasma researchers. We took the VLPL-S, the PIC code developed at Shanghai Jiao Tong University, as an example to address this concern. We applied the three types of optimization: compute-oriented optimizations, parallel 10, and dynamic loading balancing. We evaluated the optimized VLPL-S code with real test cases on the KNL. The experiments results show our optimization can achieve 1.53X speedup in overall performance, and the performance on the KNL is 1.77X faster than that of a two-socket Intel Xeon E5-2697v4 node. The optimizations we developed for the VLPS-S code can be applied to the other PIC codes.","PeriodicalId":371083,"journal":{"name":"Proceedings of Workshops of HPC Asia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshops of HPC Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3176364.3176376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The particle-in-cell (PIC) code is one of the mainstream algorithms in the laser plasma research area. However, the programming challenges to achieve high performance of PIC codes on the Intel Knights Landing (KNL) processor is widely concerned by global laser plasma researchers. We took the VLPL-S, the PIC code developed at Shanghai Jiao Tong University, as an example to address this concern. We applied the three types of optimization: compute-oriented optimizations, parallel 10, and dynamic loading balancing. We evaluated the optimized VLPL-S code with real test cases on the KNL. The experiments results show our optimization can achieve 1.53X speedup in overall performance, and the performance on the KNL is 1.77X faster than that of a two-socket Intel Xeon E5-2697v4 node. The optimizations we developed for the VLPS-S code can be applied to the other PIC codes.