ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste
{"title":"Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee","authors":"ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste","doi":"arxiv-2409.07232","DOIUrl":null,"url":null,"abstract":"Currently, the Weather Research and Forecasting model (WRF) utilizes shared\nmemory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of\nGPU resources on the Perlmutter supercomputer at NERSC, we port parts of the\ncomputationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)\nmicrophysical scheme to NVIDIA GPUs using OpenMP device offloading directives.\nTo facilitate this process, we explore a workflow for optimization which uses\nboth runtime profilers and a static code inspection tool Codee to refactor the\nsubroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm\ntest case.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Currently, the Weather Research and Forecasting model (WRF) utilizes shared
memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of
GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the
computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)
microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives.
To facilitate this process, we explore a workflow for optimization which uses
both runtime profilers and a static code inspection tool Codee to refactor the
subroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm
test case.