David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama
{"title":"Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations","authors":"David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama","doi":"arxiv-2401.05868","DOIUrl":null,"url":null,"abstract":"In this work, we introduce a new algorithm for N-to-M checkpointing in finite\nelement simulations. This new algorithm allows efficient saving/loading of\nfunctions representing physical quantities associated with the mesh\nrepresenting the physical domain. Specifically, the algorithm allows for using\ndifferent numbers of parallel processes for saving and loading, allowing for\nrestarting and post-processing on the process count appropriate to the given\nphase of the simulation and other conditions. For demonstration, we implemented\nthis algorithm in PETSc, the Portable, Extensible Toolkit for Scientific\nComputation, and added a convenient high-level interface into Firedrake, a\nsystem for solving partial differential equations using finite element methods.\nWe evaluated our new implementation by saving and loading data involving 8.2\nbillion finite element degrees of freedom using 8,192 parallel processes on\nARCHER2, the UK National Supercomputing Service.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.05868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this work, we introduce a new algorithm for N-to-M checkpointing in finite
element simulations. This new algorithm allows efficient saving/loading of
functions representing physical quantities associated with the mesh
representing the physical domain. Specifically, the algorithm allows for using
different numbers of parallel processes for saving and loading, allowing for
restarting and post-processing on the process count appropriate to the given
phase of the simulation and other conditions. For demonstration, we implemented
this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific
Computation, and added a convenient high-level interface into Firedrake, a
system for solving partial differential equations using finite element methods.
We evaluated our new implementation by saving and loading data involving 8.2
billion finite element degrees of freedom using 8,192 parallel processes on
ARCHER2, the UK National Supercomputing Service.