Speculative Parallel Execution for Local Timestepping

Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation Pub Date : 2021-05-21 DOI:10.1145/3437959.3459257

Maximilian H. Bremer, J. Bachan, Cy P. Chan, C. Dawson

{"title":"Speculative Parallel Execution for Local Timestepping","authors":"Maximilian H. Bremer, J. Bachan, Cy P. Chan, C. Dawson","doi":"10.1145/3437959.3459257","DOIUrl":null,"url":null,"abstract":"Currently, synchronous timestepping for fluid and plasma simulations requires selection of a global time step that conservatively satisfies stability conditions everywhere. However, this approach causes substantial unnecessary work in the presence of large variations of element sizes or local wavespeeds. Local timestepping can significantly reduce work by allowing subdomains to take steps according to local rather than global stability constraints. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the submesh and its neighbors, dependencies become irregular and may dynamically change as neighbors take smaller or larger timesteps. Furthermore, coarsening and refining timesteps introduces dynamic load imbalance. In order to correctly resolve these dependencies in a distributed setting, we parallelize the local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to eliminate misspeculation when dependencies can be identified early, and present a semi-static load balancing strategy to improve scalability. We present detailed performance characterizations of event overheads, misspeculation, and scalability of our approach. Our numerical experiments demonstrate up to a 2.8x speedup versus a baseline unoptimized approach; a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping; and scalability up to 3,072 cores on NERSC Cori's Haswell partition.","PeriodicalId":169025,"journal":{"name":"Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437959.3459257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Currently, synchronous timestepping for fluid and plasma simulations requires selection of a global time step that conservatively satisfies stability conditions everywhere. However, this approach causes substantial unnecessary work in the presence of large variations of element sizes or local wavespeeds. Local timestepping can significantly reduce work by allowing subdomains to take steps according to local rather than global stability constraints. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the submesh and its neighbors, dependencies become irregular and may dynamically change as neighbors take smaller or larger timesteps. Furthermore, coarsening and refining timesteps introduces dynamic load imbalance. In order to correctly resolve these dependencies in a distributed setting, we parallelize the local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to eliminate misspeculation when dependencies can be identified early, and present a semi-static load balancing strategy to improve scalability. We present detailed performance characterizations of event overheads, misspeculation, and scalability of our approach. Our numerical experiments demonstrate up to a 2.8x speedup versus a baseline unoptimized approach; a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping; and scalability up to 3,072 cores on NERSC Cori's Haswell partition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

本地时间步进的推测并行执行

目前，流体和等离子体模拟的同步时间步进需要选择一个全局时间步长，该时间步长保守地满足所有稳定性条件。然而，这种方法在元件尺寸或局部波速变化很大的情况下会导致大量不必要的工作。通过允许子域根据局部而不是全局稳定性约束采取步骤，本地时间步进可以显著减少工作。然而，该算法的并行化存在相当大的困难。由于稳定性条件取决于子网格及其邻居的状态，因此依赖关系变得不规则，并且可能随着邻居采取更小或更大的时间步长而动态变化。此外，时间步长的粗化和细化引入了动态负载不平衡。为了在分布式设置中正确解决这些依赖关系，我们使用乐观(基于timewarp)并行离散事件模拟并行化本地时间步进算法。我们引入了等待启发式，以便在可以早期识别依赖关系时消除错误猜测，并提出了半静态负载平衡策略以提高可伸缩性。我们详细描述了我们的方法的事件开销、错误推测和可伸缩性。我们的数值实验表明，与基线未优化方法相比，提速高达2.8倍;与同步时间步进的MPI并行化相比，每个节点的吞吐量提高了4倍;在NERSC Cori的Haswell分区上可扩展性高达3,072个内核。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

自引率

0.00%

发文量