Brian Friesen, Ann Almgren, Zarija Lukić, Gunther Weber, Dmitriy Morozov, Vincent Beckner, Marcus Day
{"title":"In situ and in-transit analysis of cosmological simulations","authors":"Brian Friesen, Ann Almgren, Zarija Lukić, Gunther Weber, Dmitriy Morozov, Vincent Beckner, Marcus Day","doi":"10.1186/s40668-016-0017-2","DOIUrl":null,"url":null,"abstract":"<p>Modern cosmological simulations have reached the trillion-element scale, rendering data storage and subsequent analysis formidable tasks. To address this circumstance, we present a new MPI-parallel approach for analysis of simulation data while the simulation runs, as an alternative to the traditional workflow consisting of periodically saving large data sets to disk for subsequent ‘offline’ analysis. We demonstrate this approach in the compressible gasdynamics/<i>N</i>-body code Nyx, a hybrid <span>\\(\\mbox{MPI}+\\mbox{OpenMP}\\)</span> code based on the BoxLib framework, used for large-scale cosmological simulations. We have enabled on-the-fly workflows in two different ways: one is a straightforward approach consisting of all MPI processes periodically halting the main simulation and analyzing each component of data that they own (‘<i>in situ</i>’). The other consists of partitioning processes into disjoint MPI groups, with one performing the simulation and periodically sending data to the other ‘sidecar’ group, which post-processes it while the simulation continues (‘in-transit’). The two groups execute their tasks asynchronously, stopping only to synchronize when a new set of simulation data needs to be analyzed. For both the <i>in situ</i> and in-transit approaches, we experiment with two different analysis suites with distinct performance behavior: one which finds dark matter halos in the simulation using merge trees to calculate the mass contained within iso-density contours, and another which calculates probability distribution functions and power spectra of various fields in the simulation. Both are common analysis tasks for cosmology, and both result in summary statistics significantly smaller than the original data set. We study the behavior of each type of analysis in each workflow in order to determine the optimal configuration for the different data analysis algorithms.</p>","PeriodicalId":523,"journal":{"name":"Computational Astrophysics and Cosmology","volume":null,"pages":null},"PeriodicalIF":16.2810,"publicationDate":"2016-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40668-016-0017-2","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Astrophysics and Cosmology","FirstCategoryId":"4","ListUrlMain":"https://link.springer.com/article/10.1186/s40668-016-0017-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Modern cosmological simulations have reached the trillion-element scale, rendering data storage and subsequent analysis formidable tasks. To address this circumstance, we present a new MPI-parallel approach for analysis of simulation data while the simulation runs, as an alternative to the traditional workflow consisting of periodically saving large data sets to disk for subsequent ‘offline’ analysis. We demonstrate this approach in the compressible gasdynamics/N-body code Nyx, a hybrid \(\mbox{MPI}+\mbox{OpenMP}\) code based on the BoxLib framework, used for large-scale cosmological simulations. We have enabled on-the-fly workflows in two different ways: one is a straightforward approach consisting of all MPI processes periodically halting the main simulation and analyzing each component of data that they own (‘in situ’). The other consists of partitioning processes into disjoint MPI groups, with one performing the simulation and periodically sending data to the other ‘sidecar’ group, which post-processes it while the simulation continues (‘in-transit’). The two groups execute their tasks asynchronously, stopping only to synchronize when a new set of simulation data needs to be analyzed. For both the in situ and in-transit approaches, we experiment with two different analysis suites with distinct performance behavior: one which finds dark matter halos in the simulation using merge trees to calculate the mass contained within iso-density contours, and another which calculates probability distribution functions and power spectra of various fields in the simulation. Both are common analysis tasks for cosmology, and both result in summary statistics significantly smaller than the original data set. We study the behavior of each type of analysis in each workflow in order to determine the optimal configuration for the different data analysis algorithms.
现代宇宙学模拟已经达到了万亿元素的规模,这使得数据存储和随后的分析任务变得艰巨。为了解决这种情况,我们提出了一种新的mpi并行方法,用于在模拟运行时分析模拟数据,作为传统工作流程的替代方案,传统工作流程包括定期将大型数据集保存到磁盘上,以供随后的“离线”分析。我们在可压缩气体动力学/ n -体代码Nyx中演示了这种方法,这是一种基于BoxLib框架的混合\(\mbox{MPI}+\mbox{OpenMP}\)代码,用于大规模宇宙学模拟。我们以两种不同的方式启用了实时工作流程:一种是由所有MPI进程定期停止主要模拟并分析它们拥有的每个数据组件(“原位”)组成的直接方法。另一种方法是将进程划分为不同的MPI组,其中一个执行模拟并定期将数据发送给另一个“sidecar”组,后者在模拟继续进行时对其进行后处理(“传输中”)。这两个组异步执行它们的任务,只有在需要分析一组新的模拟数据时才会停止同步。对于原位和在途方法,我们实验了两种不同的分析套件,它们具有不同的性能行为:一种是在模拟中使用合并树来发现暗物质晕,以计算等密度轮廓中包含的质量,另一种是计算模拟中各个场的概率分布函数和功率谱。这两种方法都是宇宙学中常见的分析任务,并且都会导致汇总统计数据明显小于原始数据集。我们研究了每个工作流中每种分析类型的行为,以确定不同数据分析算法的最佳配置。
期刊介绍:
Computational Astrophysics and Cosmology (CompAC) is now closed and no longer accepting submissions. However, we would like to assure you that Springer will maintain an archive of all articles published in CompAC, ensuring their accessibility through SpringerLink's comprehensive search functionality.