在 PySPOD 软件包中实现大规模并行光谱正交分解

IF 7.2 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Physics Communications Pub Date : 2024-05-16 DOI:10.1016/j.cpc.2024.109246
Marcin Rogowski , Brandon C.Y. Yeung , Oliver T. Schmidt , Romit Maulik , Lisandro Dalcin , Matteo Parsani , Gianmarco Mengaldo
{"title":"在 PySPOD 软件包中实现大规模并行光谱正交分解","authors":"Marcin Rogowski ,&nbsp;Brandon C.Y. Yeung ,&nbsp;Oliver T. Schmidt ,&nbsp;Romit Maulik ,&nbsp;Lisandro Dalcin ,&nbsp;Matteo Parsani ,&nbsp;Gianmarco Mengaldo","doi":"10.1016/j.cpc.2024.109246","DOIUrl":null,"url":null,"abstract":"<div><p>We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the <span>PySPOD</span><svg><path></path></svg> library and makes use of the standard message passing interface (MPI) library, implemented in Python via <span>mpi4py</span><svg><path></path></svg>. An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns.</p></div><div><h3>Program summary</h3><p><em>Program Title:</em> PySPOD</p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/jf5bf26jcj.1</span><svg><path></path></svg></p><p><em>Developer's repository link:</em> <span>https://github.com/MathEXLab/PySPOD</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> MIT License</p><p><em>Programming language:</em> Python</p><p><em>Nature of problem:</em> Large spatio-temporal datasets may contain coherent patterns that can be leveraged to better understand, model, and possibly predict the behavior of complex dynamical systems. To this end, modal decomposition methods, such as the proper orthogonal decomposition (POD) and its spectral counterpart (SPOD), constitute powerful tools. The SPOD algorithm allows the systematic identification of space-time coherent patterns. This can be used to understand better the physics of the process of interest, and provide a path for mathematical modeling, including reduced order modeling. The SPOD algorithm has been successfully applied to fluid dynamics, geophysics and other domains. However, the existing open-source implementations are serial, and they prevent running on the increasingly large datasets that are becoming available, especially in computational physics. The inability to analyze via SPOD large dataset in turn prevents unlocking novel mechanisms and dynamical behaviors in complex systems.</p><p><em>Solution method:</em> We provide an open-source parallel (MPI distributed) code, namely PySPOD, that is able to run on large datasets (the ones considered in the present paper reach about 200 Terabytes). The code is built on the previous serial open-source code PySPOD that was published in <span>https://joss.theoj.org/papers/10.21105/joss.02862.pdf</span><svg><path></path></svg>. The new parallel implementation is able to scale on several nodes (we show both weak and strong scalability) and solve some of the bottlenecks that are commonly found at the I/O stage. The current parallel code allows running on datasets that was not easy or possible to analyze with serial SPOD algorithms, hence providing a path towards unlocking novel findings in computational physics.</p><p><em>Additional comments including restrictions and unusual features:</em> The code comes with a set of built-in postprocessing tools, for visualizing the results. It also comes with extensive continuous integration, documentation, and tutorials, as well as a dedicated website in addition to the associated GiHub repository. Within the package we also provide a parallel implementation of the proper orthogonal decomposition (POD), that leverages the I/O parallel capabilities of the SPOD algorithm.</p></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package\",\"authors\":\"Marcin Rogowski ,&nbsp;Brandon C.Y. Yeung ,&nbsp;Oliver T. Schmidt ,&nbsp;Romit Maulik ,&nbsp;Lisandro Dalcin ,&nbsp;Matteo Parsani ,&nbsp;Gianmarco Mengaldo\",\"doi\":\"10.1016/j.cpc.2024.109246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the <span>PySPOD</span><svg><path></path></svg> library and makes use of the standard message passing interface (MPI) library, implemented in Python via <span>mpi4py</span><svg><path></path></svg>. An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns.</p></div><div><h3>Program summary</h3><p><em>Program Title:</em> PySPOD</p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/jf5bf26jcj.1</span><svg><path></path></svg></p><p><em>Developer's repository link:</em> <span>https://github.com/MathEXLab/PySPOD</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> MIT License</p><p><em>Programming language:</em> Python</p><p><em>Nature of problem:</em> Large spatio-temporal datasets may contain coherent patterns that can be leveraged to better understand, model, and possibly predict the behavior of complex dynamical systems. To this end, modal decomposition methods, such as the proper orthogonal decomposition (POD) and its spectral counterpart (SPOD), constitute powerful tools. The SPOD algorithm allows the systematic identification of space-time coherent patterns. This can be used to understand better the physics of the process of interest, and provide a path for mathematical modeling, including reduced order modeling. The SPOD algorithm has been successfully applied to fluid dynamics, geophysics and other domains. However, the existing open-source implementations are serial, and they prevent running on the increasingly large datasets that are becoming available, especially in computational physics. The inability to analyze via SPOD large dataset in turn prevents unlocking novel mechanisms and dynamical behaviors in complex systems.</p><p><em>Solution method:</em> We provide an open-source parallel (MPI distributed) code, namely PySPOD, that is able to run on large datasets (the ones considered in the present paper reach about 200 Terabytes). The code is built on the previous serial open-source code PySPOD that was published in <span>https://joss.theoj.org/papers/10.21105/joss.02862.pdf</span><svg><path></path></svg>. The new parallel implementation is able to scale on several nodes (we show both weak and strong scalability) and solve some of the bottlenecks that are commonly found at the I/O stage. The current parallel code allows running on datasets that was not easy or possible to analyze with serial SPOD algorithms, hence providing a path towards unlocking novel findings in computational physics.</p><p><em>Additional comments including restrictions and unusual features:</em> The code comes with a set of built-in postprocessing tools, for visualizing the results. It also comes with extensive continuous integration, documentation, and tutorials, as well as a dedicated website in addition to the associated GiHub repository. Within the package we also provide a parallel implementation of the proper orthogonal decomposition (POD), that leverages the I/O parallel capabilities of the SPOD algorithm.</p></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465524001693\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524001693","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了光谱正交分解(SPOD)技术的并行(分布式)版本。并行 SPOD 算法将数据集的空间维度分布在保留时间的情况下。采用这种方法可以保留数据在时间上的非分布式快速傅里叶变换,从而避免相关瓶颈。并行 SPOD 算法在 PySPOD 库中实现,并利用了标准消息传递接口(MPI)库,通过 mpi4py 在 Python 中实现。对并行软件包进行了广泛的性能评估,包括强可扩展性分析和弱可扩展性分析。该开源库允许对科学界感兴趣的大型数据集进行分析。在这里,我们介绍了流体动力学和地球物理学中的应用,如果没有并行算法,这些应用是极难实现的(如果不是不可能的话)。这项工作开辟了对大型准稳态数据进行模态分析的道路,有助于发现新的未开发时空模式:PySPODCPC 库程序文件链接:https://doi.org/10.17632/jf5bf26jcj.1Developer's repository 链接:https://github.com/MathEXLab/PySPODLicensing provisions:MIT 许可编程语言:Python问题性质:大型时空数据集可能包含连贯模式,可以利用这些模式更好地理解、模拟并预测复杂动态系统的行为。为此,模态分解方法,如适当正交分解(POD)及其对应的频谱分解(SPOD),构成了强大的工具。SPOD 算法可以系统地识别时空相干模式。这可用于更好地理解相关过程的物理原理,并为数学建模(包括降阶建模)提供路径。SPOD 算法已成功应用于流体动力学、地球物理学和其他领域。然而,现有的开源实现都是串行的,无法在日益庞大的数据集上运行,尤其是在计算物理领域。无法通过 SPOD 分析大型数据集反过来又阻碍了揭示复杂系统中的新机制和动态行为:我们提供了一种开源并行(MPI 分布式)代码,即 PySPOD,它能够在大型数据集上运行(本文中考虑的数据集达到约 200 太字节)。该代码基于之前发布于 https://joss.theoj.org/papers/10.21105/joss.02862.pdf 的串行开源代码 PySPOD。新的并行执行能够在多个节点上扩展(我们展示了弱扩展性和强扩展性),并解决了一些在 I/O 阶段常见的瓶颈问题。当前的并行代码可以运行在用串行 SPOD 算法不容易或不可能分析的数据集上,从而为解锁计算物理学的新发现提供了一条途径:代码带有一套内置的后处理工具,用于可视化结果。除了相关的 GiHub 代码库之外,它还附带了大量的持续集成、文档和教程,以及一个专门的网站。在该软件包中,我们还提供了适当正交分解(POD)的并行执行,利用了 SPOD 算法的 I/O 并行能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package

We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py. An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns.

Program summary

Program Title: PySPOD

CPC Library link to program files: https://doi.org/10.17632/jf5bf26jcj.1

Developer's repository link: https://github.com/MathEXLab/PySPOD

Licensing provisions: MIT License

Programming language: Python

Nature of problem: Large spatio-temporal datasets may contain coherent patterns that can be leveraged to better understand, model, and possibly predict the behavior of complex dynamical systems. To this end, modal decomposition methods, such as the proper orthogonal decomposition (POD) and its spectral counterpart (SPOD), constitute powerful tools. The SPOD algorithm allows the systematic identification of space-time coherent patterns. This can be used to understand better the physics of the process of interest, and provide a path for mathematical modeling, including reduced order modeling. The SPOD algorithm has been successfully applied to fluid dynamics, geophysics and other domains. However, the existing open-source implementations are serial, and they prevent running on the increasingly large datasets that are becoming available, especially in computational physics. The inability to analyze via SPOD large dataset in turn prevents unlocking novel mechanisms and dynamical behaviors in complex systems.

Solution method: We provide an open-source parallel (MPI distributed) code, namely PySPOD, that is able to run on large datasets (the ones considered in the present paper reach about 200 Terabytes). The code is built on the previous serial open-source code PySPOD that was published in https://joss.theoj.org/papers/10.21105/joss.02862.pdf. The new parallel implementation is able to scale on several nodes (we show both weak and strong scalability) and solve some of the bottlenecks that are commonly found at the I/O stage. The current parallel code allows running on datasets that was not easy or possible to analyze with serial SPOD algorithms, hence providing a path towards unlocking novel findings in computational physics.

Additional comments including restrictions and unusual features: The code comes with a set of built-in postprocessing tools, for visualizing the results. It also comes with extensive continuous integration, documentation, and tutorials, as well as a dedicated website in addition to the associated GiHub repository. Within the package we also provide a parallel implementation of the proper orthogonal decomposition (POD), that leverages the I/O parallel capabilities of the SPOD algorithm.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
期刊最新文献
A novel model for direct numerical simulation of suspension dynamics with arbitrarily shaped convex particles Editorial Board Study α decay and proton emission based on data-driven symbolic regression Efficient determination of free energies of non-ideal solid solutions via hybrid Monte Carlo simulations 1D drift-kinetic numerical model based on semi-implicit particle-in-cell method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1