{"title":"确定性和随机CFD问题的双级并行性","authors":"S. Dong, G. Karniadakis","doi":"10.1109/SC.2002.10005","DOIUrl":null,"url":null,"abstract":"A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively p-refinement, which is crucial to adaptive discretization using high-order methods.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Dual-Level Parallelism for Deterministic and Stochastic CFD Problems\",\"authors\":\"S. Dong, G. Karniadakis\",\"doi\":\"10.1109/SC.2002.10005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively p-refinement, which is crucial to adaptive discretization using high-order methods.\",\"PeriodicalId\":302800,\"journal\":{\"name\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.2002.10005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2002 Conference (SC'02)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2002.10005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dual-Level Parallelism for Deterministic and Stochastic CFD Problems
A hybrid two-level parallelism using MPI/OpenMP is implemented in the general-purpose spectral/hp element CFD code NekTar to take advantage of the hierarchical structures arising in deterministic and stochastic CFD problems. We take a coarse grain approach to shared-memory parallelism with OpenMP and employ a workload-splitting scheme that can reduce the OpenMP synchronizations to the minimum. The hybrid implementation shows good scalability with respect to both the problem size and the number of processors in case of a fixed problem size. With the same number of processors, the hybrid model with 2 (or 4) OpenMP threads per MPI process is observed to perform better than pure MPI and pure OpenMP on the NCSA SGI Origin 2000, while the pure MPI model performs the best on the IBM SP3 at SDSC and on the Compaq Alpha cluster at PSC. A key new result is that the use of threads facilitates effectively p-refinement, which is crucial to adaptive discretization using high-order methods.