High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI:10.1109/SC.2014.26

T. Shimokawabe, T. Aoki, Naoyuki Onodera

{"title":"High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA","authors":"T. Shimokawabe, T. Aoki, Naoyuki Onodera","doi":"10.1109/SC.2014.26","DOIUrl":null,"url":null,"abstract":"The weather prediction code demands large computational performance to achieve fast and high-resolution simulations. Skillful programming techniques are required for obtaining good parallel efficiency on GPU supercomputers. Our framework-based weather prediction code ASUCA has achieved good scalability with hiding complicated implementation and optimizations required for distributed GPUs, contributing to increasing the maintainability, ASUCA is a next-generation high resolution meso-scale atmospheric model being developed by the Japan Meteorological Agency. Our framework automatically translates user-written stencil functions that update grid points and generates both GPU and CPU codes. User-written codes are parallelized by MPI with intra-node GPU peer-to-peer direct access. These codes can easily utilize optimizations such as overlapping technique to hide communication overhead by computation. Our simulations on the GPU-rich supercomputer TSUBAME 2.5 at the Tokyo Institute of Technology have demonstrated good strong and weak scalability achieving 209.6 TFlops in single precision for our largest model using 4,108 NVIDIA K20X GPUs.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2014.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

The weather prediction code demands large computational performance to achieve fast and high-resolution simulations. Skillful programming techniques are required for obtaining good parallel efficiency on GPU supercomputers. Our framework-based weather prediction code ASUCA has achieved good scalability with hiding complicated implementation and optimizations required for distributed GPUs, contributing to increasing the maintainability, ASUCA is a next-generation high resolution meso-scale atmospheric model being developed by the Japan Meteorological Agency. Our framework automatically translates user-written stencil functions that update grid points and generates both GPU and CPU codes. User-written codes are parallelized by MPI with intra-node GPU peer-to-peer direct access. These codes can easily utilize optimizations such as overlapping technique to hide communication overhead by computation. Our simulations on the GPU-rich supercomputer TSUBAME 2.5 at the Tokyo Institute of Technology have demonstrated good strong and weak scalability achieving 209.6 TFlops in single precision for our largest model using 4,108 NVIDIA K20X GPUs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

操作天气预报代码ASUCA在gpu丰富的超级计算机上的高生产力框架

天气预报代码需要大量的计算性能来实现快速和高分辨率的模拟。为了在GPU超级计算机上获得良好的并行效率，需要熟练的编程技术。我们的基于框架的天气预报代码ASUCA具有良好的可扩展性，隐藏了分布式gpu所需的复杂实现和优化，有助于提高可维护性，ASUCA是日本气象厅正在开发的下一代高分辨率中尺度大气模式。我们的框架自动转换用户编写的模板函数，更新网格点并生成GPU和CPU代码。用户编写的代码通过MPI与节点内GPU点对点直接访问并行化。这些代码可以很容易地利用重叠技术等优化来隐藏计算带来的通信开销。我们在东京工业大学的gpu丰富的超级计算机TSUBAME 2.5上的模拟显示了良好的强扩展性和弱扩展性，在我们使用4,108个NVIDIA K20X gpu的最大模型中实现了单精度209.6 TFlops。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量

期刊最新文献

Microbank: Architecting Through-Silicon Interposer-Based Main Memory Systems Fast Iterative Graph Computation: A Path Centric Approach Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications MSL: A Synthesis Enabled Language for Distributed Implementations A Communication-Optimal Framework for Contracting Distributed Tensors