Speculative Parallel Execution for Local Timestepping

Maximilian H. Bremer, J. Bachan, Cy P. Chan, C. Dawson
{"title":"Speculative Parallel Execution for Local Timestepping","authors":"Maximilian H. Bremer, J. Bachan, Cy P. Chan, C. Dawson","doi":"10.1145/3437959.3459257","DOIUrl":null,"url":null,"abstract":"Currently, synchronous timestepping for fluid and plasma simulations requires selection of a global time step that conservatively satisfies stability conditions everywhere. However, this approach causes substantial unnecessary work in the presence of large variations of element sizes or local wavespeeds. Local timestepping can significantly reduce work by allowing subdomains to take steps according to local rather than global stability constraints. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the submesh and its neighbors, dependencies become irregular and may dynamically change as neighbors take smaller or larger timesteps. Furthermore, coarsening and refining timesteps introduces dynamic load imbalance. In order to correctly resolve these dependencies in a distributed setting, we parallelize the local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to eliminate misspeculation when dependencies can be identified early, and present a semi-static load balancing strategy to improve scalability. We present detailed performance characterizations of event overheads, misspeculation, and scalability of our approach. Our numerical experiments demonstrate up to a 2.8x speedup versus a baseline unoptimized approach; a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping; and scalability up to 3,072 cores on NERSC Cori's Haswell partition.","PeriodicalId":169025,"journal":{"name":"Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437959.3459257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Currently, synchronous timestepping for fluid and plasma simulations requires selection of a global time step that conservatively satisfies stability conditions everywhere. However, this approach causes substantial unnecessary work in the presence of large variations of element sizes or local wavespeeds. Local timestepping can significantly reduce work by allowing subdomains to take steps according to local rather than global stability constraints. However, parallelizing this algorithm presents considerable difficulty. Since the stability condition depends on the state of the submesh and its neighbors, dependencies become irregular and may dynamically change as neighbors take smaller or larger timesteps. Furthermore, coarsening and refining timesteps introduces dynamic load imbalance. In order to correctly resolve these dependencies in a distributed setting, we parallelize the local timestepping algorithm using an optimistic (Timewarp-based) parallel discrete event simulation. We introduce waiting heuristics to eliminate misspeculation when dependencies can be identified early, and present a semi-static load balancing strategy to improve scalability. We present detailed performance characterizations of event overheads, misspeculation, and scalability of our approach. Our numerical experiments demonstrate up to a 2.8x speedup versus a baseline unoptimized approach; a 4x improvement in per-node throughput compared to an MPI parallelization of synchronous timestepping; and scalability up to 3,072 cores on NERSC Cori's Haswell partition.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
本地时间步进的推测并行执行
目前,流体和等离子体模拟的同步时间步进需要选择一个全局时间步长,该时间步长保守地满足所有稳定性条件。然而,这种方法在元件尺寸或局部波速变化很大的情况下会导致大量不必要的工作。通过允许子域根据局部而不是全局稳定性约束采取步骤,本地时间步进可以显著减少工作。然而,该算法的并行化存在相当大的困难。由于稳定性条件取决于子网格及其邻居的状态,因此依赖关系变得不规则,并且可能随着邻居采取更小或更大的时间步长而动态变化。此外,时间步长的粗化和细化引入了动态负载不平衡。为了在分布式设置中正确解决这些依赖关系,我们使用乐观(基于timewarp)并行离散事件模拟并行化本地时间步进算法。我们引入了等待启发式,以便在可以早期识别依赖关系时消除错误猜测,并提出了半静态负载平衡策略以提高可伸缩性。我们详细描述了我们的方法的事件开销、错误推测和可伸缩性。我们的数值实验表明,与基线未优化方法相比,提速高达2.8倍;与同步时间步进的MPI并行化相比,每个节点的吞吐量提高了4倍;在NERSC Cori的Haswell分区上可扩展性高达3,072个内核。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Session details: Session 8: Discrete Event Simulations COSIDIA: An Approach for Real-Time Parallel Discrete Event Simulations Tailored for Wireless Networks Comparing Implementations of Cellular Automata as Images: A Novel Approach to Verification by Combining Image Processing and Machine Learning When the Wisdom of Crowd is Able to Overturn an Unpopular Norm? Lessons Learned from an Agent-Based Simulation Causality and Consistency of State Update Schemes in Synchronous Agent-based Simulations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1