Pub Date : 2024-11-19DOI: 10.1016/j.cpc.2024.109433
Deniz A. Bezgin , Aaron B. Buhendwa , Nikolaus A. Adams
In our effort to facilitate machine learning-assisted computational fluid dynamics (CFD), we introduce the second iteration of JAX-Fluids. JAX-Fluids is a Python-based fully-differentiable CFD solver designed for compressible single- and two-phase flows. In this work, the first version is extended to incorporate high-performance computing (HPC) capabilities. We introduce a parallelization strategy utilizing JAX primitive operations that scales efficiently on GPU (up to 512 NVIDIA A100 graphics cards) and TPU (up to 1024 TPU v3 cores) HPC systems. We further demonstrate stable parallel computation of automatic differentiation gradients across extended integration trajectories. The new code version offers enhanced two-phase flow modeling capabilities. In particular, a five-equation diffuse-interface model is incorporated which complements the level-set sharp-interface model. Additional algorithmic improvements include positivity-preserving limiters for increased robustness, support for stretched Cartesian meshes, refactored I/O handling, comprehensive post-processing routines, and an updated list of state-of-the-art high-order numerical discretization schemes. We verify newly added numerical models by showcasing simulation results for single- and two-phase flows, including turbulent boundary layer and channel flows, air-helium shock bubble interactions, and air-water shock drop interactions.
{"title":"JAX-Fluids 2.0: Towards HPC for differentiable CFD of compressible two-phase flows","authors":"Deniz A. Bezgin , Aaron B. Buhendwa , Nikolaus A. Adams","doi":"10.1016/j.cpc.2024.109433","DOIUrl":"10.1016/j.cpc.2024.109433","url":null,"abstract":"<div><div>In our effort to facilitate machine learning-assisted computational fluid dynamics (CFD), we introduce the second iteration of JAX-Fluids. JAX-Fluids is a Python-based fully-differentiable CFD solver designed for compressible single- and two-phase flows. In this work, the first version is extended to incorporate high-performance computing (HPC) capabilities. We introduce a parallelization strategy utilizing JAX primitive operations that scales efficiently on GPU (up to 512 NVIDIA A100 graphics cards) and TPU (up to 1024 TPU v3 cores) HPC systems. We further demonstrate stable parallel computation of automatic differentiation gradients across extended integration trajectories. The new code version offers enhanced two-phase flow modeling capabilities. In particular, a five-equation diffuse-interface model is incorporated which complements the level-set sharp-interface model. Additional algorithmic improvements include positivity-preserving limiters for increased robustness, support for stretched Cartesian meshes, refactored I/O handling, comprehensive post-processing routines, and an updated list of state-of-the-art high-order numerical discretization schemes. We verify newly added numerical models by showcasing simulation results for single- and two-phase flows, including turbulent boundary layer and channel flows, air-helium shock bubble interactions, and air-water shock drop interactions.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"308 ","pages":"Article 109433"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1016/j.cpc.2024.109445
Tianzhao Li , Wenjin Gao , Guoxiang Zhi , Shuwei Zhai , Jiahua Xu , Ling Zhang , Weijuan Hu , Biyu Song , Shuoke Xu , Miao Zhou
Recent years have witnessed a surge of research on the structure, property and performance engineering of two-dimensional (2D) materials by ion irradiation. Compared to the 3D counterparts, 2D systems exhibit drastically different and even counter-intuitive irradiation response, and an atomic insight into the ion bombardment and defect formation is essential. In this work, we develop a theoretical framework I2DM for simulating ion irradiation on two-dimensional (2D) materials using Monte Carlo (MC) algorithm. I2DM can generate incident ions with adjustable ion species, incident energy, ion fluence and incident angle. Based on binary collision approximation (BCA), the primary collisions, cascade collisions and defect recombination during irradiation process are explicitly described. As output, details on the defect type/yield and morphology of irradiated material are provided. We have performed systematic simulations on three typical 2D structures, including graphene, h-BN, and MoS2 under different ion irradiation conditions, and reveal that the obtained results are in excellent agreement with the available experimental measurements and molecular dynamics data. The developed framework is generally applicable and computationally efficient, highly valuable for understanding the fundamental mechanism of ion irradiation on 2D systems and designing/optimizing low-dimensional structures for nanoelectronics, spintronics, optics, energy storage and environmental protection.
Program summary: Program Title: I2DM. CPC Library link to program files: https://doi.org/10.17632/pf2pz4fxj3.1. Licensing provisions: GPLv2. Programming language: Python. Supplementary material: Supplementary material is available. Nature of problem: A general MC framework for simulating ion irradiation on 2D materials; calculate the energy loss by nuclear stopping and electronic stopping; simulate primary/cascade collisions and defect recombination; predict defect type/yield and morphology of 2D target. Solution method: This framework uses BCA to describe nuclear stopping for the irradiation process; the energy loss by electronic stopping is computed by a semiempirical model combining Oen-Robinson model and Lindhard-Scharff model; simultaneous collision is included for describing many-body interaction; capture radius is introduced to simulate defect recombination.
{"title":"I2DM: A Monte Carlo framework for ion irradiation on two-dimensional materials","authors":"Tianzhao Li , Wenjin Gao , Guoxiang Zhi , Shuwei Zhai , Jiahua Xu , Ling Zhang , Weijuan Hu , Biyu Song , Shuoke Xu , Miao Zhou","doi":"10.1016/j.cpc.2024.109445","DOIUrl":"10.1016/j.cpc.2024.109445","url":null,"abstract":"<div><div>Recent years have witnessed a surge of research on the structure, property and performance engineering of two-dimensional (2D) materials by ion irradiation. Compared to the 3D counterparts, 2D systems exhibit drastically different and even counter-intuitive irradiation response, and an atomic insight into the ion bombardment and defect formation is essential. In this work, we develop a theoretical framework I2DM for simulating ion irradiation on two-dimensional (2D) materials using Monte Carlo (MC) algorithm. I2DM can generate incident ions with adjustable ion species, incident energy, ion fluence and incident angle. Based on binary collision approximation (BCA), the primary collisions, cascade collisions and defect recombination during irradiation process are explicitly described. As output, details on the defect type/yield and morphology of irradiated material are provided. We have performed systematic simulations on three typical 2D structures, including graphene, <em>h</em>-BN, and MoS<sub>2</sub> under different ion irradiation conditions, and reveal that the obtained results are in excellent agreement with the available experimental measurements and molecular dynamics data. The developed framework is generally applicable and computationally efficient, highly valuable for understanding the fundamental mechanism of ion irradiation on 2D systems and designing/optimizing low-dimensional structures for nanoelectronics, spintronics, optics, energy storage and environmental protection.</div><div><strong>Program summary:</strong> Program Title: I2DM. CPC Library link to program files: <span><span>https://doi.org/10.17632/pf2pz4fxj3.1</span><svg><path></path></svg></span>. Licensing provisions: GPLv2. Programming language: Python. Supplementary material: Supplementary material is available. Nature of problem: A general MC framework for simulating ion irradiation on 2D materials; calculate the energy loss by nuclear stopping and electronic stopping; simulate primary/cascade collisions and defect recombination; predict defect type/yield and morphology of 2D target. Solution method: This framework uses BCA to describe nuclear stopping for the irradiation process; the energy loss by electronic stopping is computed by a semiempirical model combining Oen-Robinson model and Lindhard-Scharff model; simultaneous collision is included for describing many-body interaction; capture radius is introduced to simulate defect recombination.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"308 ","pages":"Article 109445"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1016/j.cpc.2024.109441
Yuxuan Wang , Kerui Lai , Guoqiang Lan, Jun Song
This paper introduces Ph3pyWF, a Python software package we designed to facilitate high-throughput analysis of lattice thermal conductivity in ceramic materials. The user interface caters to individuals with varying expertise, accommodating both novices and experts in the field. For beginners, only the initial structure file is required as input, as the software automatically populates other necessary parameters. Advanced users can customize numerous procedure parameters to suit their specific research needs. At its core, Ph3pyWF aims to establish an efficient data exchange and task management system. This paper elucidates the design details of the software package and presents several examples of its application to oxide ceramics, showcasing its general applicability and practicality in the analysis of lattice thermal conductivity.
{"title":"Ph3pyWF: An automated workflow software package for ceramic lattice thermal conductivity calculation","authors":"Yuxuan Wang , Kerui Lai , Guoqiang Lan, Jun Song","doi":"10.1016/j.cpc.2024.109441","DOIUrl":"10.1016/j.cpc.2024.109441","url":null,"abstract":"<div><div>This paper introduces Ph3pyWF, a Python software package we designed to facilitate high-throughput analysis of lattice thermal conductivity in ceramic materials. The user interface caters to individuals with varying expertise, accommodating both novices and experts in the field. For beginners, only the initial structure file is required as input, as the software automatically populates other necessary parameters. Advanced users can customize numerous procedure parameters to suit their specific research needs. At its core, Ph3pyWF aims to establish an efficient data exchange and task management system. This paper elucidates the design details of the software package and presents several examples of its application to oxide ceramics, showcasing its general applicability and practicality in the analysis of lattice thermal conductivity.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109441"},"PeriodicalIF":7.2,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.cpc.2024.109431
Tiago E.C. Magalhães
<div><div>PyWolf is an open-source software with a graphical user interface that performs numerical simulations of the cross-spectral density function propagation of planar sources using parallel computation through PyOpenCL. In the previous versions of PyWolf, the user could select the OpenCL devices and platforms to perform the parallel computations on several tasks, except for that related to the two-dimensional (2D) fast Fourier transform (FFT) algorithm. The latter task can have a large computation time since one has to perform a large amount of 2D FFTs over 2D slices of a four-dimensional array. The option of using multithread-based computation on these loops and other tasks can be an advantage for multi-core CPUs and can significantly decrease the computation time. Here, I present version 3.0.0 of PyWolf, which adds a multithreading option to be used for the 2D FFT computations. This multithreading option can also be easily implemented in other time-consuming tasks.</div></div><div><h3>New version program summary</h3><div><em>Program Title:</em> PyWolf</div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/frjscxypkd.3</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/tiagoecmagalhaes/PyWolf</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> GPLv3</div><div><em>Programming language:</em> Python</div><div><em>Supplementary material:</em> Overview of the main changes with performance results.</div><div><em>Journal reference of previous version:</em> Comput. Phys. Commun. 294 (2024) 108899.</div><div><em>Reasons for the new version:</em> In the original paper of PyWolf <span><span>[1]</span></span> and in the previous version <span><span>[2]</span></span>, parallel computation was performed only using PyOpenCL. However, in some cases where multiple cores are available in the CPU, multithreading <span><span>[3]</span></span> can significantly decrease the computation time of some tasks, for instance, the loops of 2D fast Fourier transforms (FFTs). This new version includes a built-in option for multithreading, enabling users to select the number of threads to be used in the numerical simulation.</div><div><em>Summary of revisions:</em> Multithreading support was added to PyWolf and users can now select this feature in PyWolf's graphical user interface and choose the number of available threads to be used in the simulation. In the current version, multithreading is only used for the loops of 2D FFTs but can be easily extended to other tasks. Other small features have been added and some issues have been corrected, namely: (i) a requirements file has been added listing all the libraries used; (ii) some errors associated with file paths have been corrected.</div><div><em>Nature of problem:</em> Propagation of partially coherent light from planar sources in the Fresnel or far field approximations using four-dimensional
PyWolf 是一款图形用户界面的开源软件,通过 PyOpenCL 并行计算,对平面光源的交叉谱密度函数传播进行数值模拟。在 PyWolf 以前的版本中,用户可以选择 OpenCL 设备和平台来执行多项任务的并行计算,但与二维(2D)快速傅立叶变换(FFT)算法相关的任务除外。后一项任务的计算时间较长,因为需要在四维阵列的二维切片上执行大量的二维 FFT。对于多核 CPU 而言,在这些循环和其他任务中使用基于多线程的计算是一个优势,可以显著减少计算时间。在此,我介绍 PyWolf 的 3.0.0 版本,它为二维 FFT 计算添加了一个多线程选项。新版本程序摘要程序标题:PyWolfCPC Library 程序文件链接:https://doi.org/10.17632/frjscxypkd.3Developer's repository 链接:https://github.com/tiagoecmagalhaes/PyWolfLicensing provisions:GPLv3 编程语言:Python补充材料:上一版本的期刊参考文献:Comput.Phys.294 (2024) 108899.Reasons for the new version:在 PyWolf 最初的论文[1]和之前的版本[2]中,并行计算只使用 PyOpenCL 进行。然而,在某些情况下,如果 CPU 有多个内核,多线程[3]可以显著减少某些任务的计算时间,例如二维快速傅立叶变换(FFT)的循环。新版本内置了多线程选项,用户可以选择数值模拟中使用的线程数量:PyWolf 增加了对多线程的支持,用户现在可以在 PyWolf 的图形用户界面中选择该功能,并在仿真中选择可用线程的数量。在当前版本中,多线程仅用于 2D FFT 的循环,但可以很容易地扩展到其他任务。问题的本质:使用四维阵列[4]、[5]以菲涅尔或远场近似的方式传播来自平面光源的部分相干光需要大量内存和计算时间。PyWolf 使用 PyOpenCL 进行并行计算,以减少跨谱密度函数传播过程中耗时的计算[4],内存容量是主要限制因素:解决方法:使用开源工具包 PyOpenCL 和多线程来减少计算时间。用户可以修改和添加 PyWolf 的更多功能,如源、传播和几何模型。用户还可以添加定制的光学元件(如透镜和光圈)。基于 PyQt5 的图形用户界面可让用户轻松设置输入参数以模拟其光学设置,绘制和导出模拟结果,以及保存或加载模拟会话。
{"title":"An improved version of PyWolf with multithread-based parallelism support","authors":"Tiago E.C. Magalhães","doi":"10.1016/j.cpc.2024.109431","DOIUrl":"10.1016/j.cpc.2024.109431","url":null,"abstract":"<div><div>PyWolf is an open-source software with a graphical user interface that performs numerical simulations of the cross-spectral density function propagation of planar sources using parallel computation through PyOpenCL. In the previous versions of PyWolf, the user could select the OpenCL devices and platforms to perform the parallel computations on several tasks, except for that related to the two-dimensional (2D) fast Fourier transform (FFT) algorithm. The latter task can have a large computation time since one has to perform a large amount of 2D FFTs over 2D slices of a four-dimensional array. The option of using multithread-based computation on these loops and other tasks can be an advantage for multi-core CPUs and can significantly decrease the computation time. Here, I present version 3.0.0 of PyWolf, which adds a multithreading option to be used for the 2D FFT computations. This multithreading option can also be easily implemented in other time-consuming tasks.</div></div><div><h3>New version program summary</h3><div><em>Program Title:</em> PyWolf</div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/frjscxypkd.3</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/tiagoecmagalhaes/PyWolf</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> GPLv3</div><div><em>Programming language:</em> Python</div><div><em>Supplementary material:</em> Overview of the main changes with performance results.</div><div><em>Journal reference of previous version:</em> Comput. Phys. Commun. 294 (2024) 108899.</div><div><em>Reasons for the new version:</em> In the original paper of PyWolf <span><span>[1]</span></span> and in the previous version <span><span>[2]</span></span>, parallel computation was performed only using PyOpenCL. However, in some cases where multiple cores are available in the CPU, multithreading <span><span>[3]</span></span> can significantly decrease the computation time of some tasks, for instance, the loops of 2D fast Fourier transforms (FFTs). This new version includes a built-in option for multithreading, enabling users to select the number of threads to be used in the numerical simulation.</div><div><em>Summary of revisions:</em> Multithreading support was added to PyWolf and users can now select this feature in PyWolf's graphical user interface and choose the number of available threads to be used in the simulation. In the current version, multithreading is only used for the loops of 2D FFTs but can be easily extended to other tasks. Other small features have been added and some issues have been corrected, namely: (i) a requirements file has been added listing all the libraries used; (ii) some errors associated with file paths have been corrected.</div><div><em>Nature of problem:</em> Propagation of partially coherent light from planar sources in the Fresnel or far field approximations using four-dimensional","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109431"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.cpc.2024.109432
Antonio Colanera , Oliver T. Schmidt , Matteo Chiatto
Experimental measurements often present corrupted data and outliers that can strongly affect the main coherent structures extracted with the classical modal analysis techniques. This effect is amplified at high frequencies, whose corresponding modes are more susceptible to contamination from measurement noise and uncertainties. Such limitations are overcome by a novel approach proposed here, the robust spectral proper orthogonal decomposition (robust SPOD), which implements the robust principal component analysis within the SPOD technique. The new technique is firstly presented with details on its algorithm, and its effectiveness is tested on two different fluid dynamics problems: the subsonic jet flow field numerically simulated, and the flow within an open cavity experimentally analyzed in [48]. The analysis of the turbulent jet data, corrupted both with salt and pepper and Gaussian noise, shows how the robust SPOD produces more converged and physically interpretable modes than the classical SPOD; moreover, the use of the robust SPOD as a tool for de-noising data, based on the signal reconstruction from de-noised modes, is also presented. Applying robust SPOD to the open cavity flow has revealed that it yields smoother spatial distributions of modes, particularly at high frequencies and when considering higher-order modes, compared to standard SPOD.
实验测量中经常会出现损坏的数据和异常值,这会严重影响用经典模态分析技术提取的主要相干结构。这种影响在高频时会被放大,因为其相应的模态更容易受到测量噪声和不确定性的污染。这里提出的一种新方法--鲁棒频谱正交分解(鲁棒 SPOD)--克服了这些限制,它在 SPOD 技术中实现了鲁棒主成分分析。首先介绍了新技术的算法细节,并在两个不同的流体动力学问题上测试了其有效性:亚音速喷流流场数值模拟和开放空腔内流动实验分析 [48]。通过对受到椒盐噪声和高斯噪声干扰的湍流喷射数据进行分析,可以看出鲁棒 SPOD 比经典 SPOD 产生了更多收敛和物理上可解释的模式;此外,还介绍了如何利用鲁棒 SPOD 作为去噪工具,根据去噪模式重建信号,对数据进行去噪。将鲁棒 SPOD 应用于开腔流发现,与标准 SPOD 相比,它能产生更平滑的模态空间分布,尤其是在高频和考虑高阶模态时。
{"title":"Robust spectral proper orthogonal decomposition","authors":"Antonio Colanera , Oliver T. Schmidt , Matteo Chiatto","doi":"10.1016/j.cpc.2024.109432","DOIUrl":"10.1016/j.cpc.2024.109432","url":null,"abstract":"<div><div>Experimental measurements often present corrupted data and outliers that can strongly affect the main coherent structures extracted with the classical modal analysis techniques. This effect is amplified at high frequencies, whose corresponding modes are more susceptible to contamination from measurement noise and uncertainties. Such limitations are overcome by a novel approach proposed here, the robust spectral proper orthogonal decomposition (robust SPOD), which implements the robust principal component analysis within the SPOD technique. The new technique is firstly presented with details on its algorithm, and its effectiveness is tested on two different fluid dynamics problems: the subsonic jet flow field numerically simulated, and the flow within an open cavity experimentally analyzed in <span><span>[48]</span></span>. The analysis of the turbulent jet data, corrupted both with salt and pepper and Gaussian noise, shows how the robust SPOD produces more converged and physically interpretable modes than the classical SPOD; moreover, the use of the robust SPOD as a tool for de-noising data, based on the signal reconstruction from de-noised modes, is also presented. Applying robust SPOD to the open cavity flow has revealed that it yields smoother spatial distributions of modes, particularly at high frequencies and when considering higher-order modes, compared to standard SPOD.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109432"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.cpc.2024.109435
Han Zhang , Baojiu Li , Tobias Weinzierl , Cristian Barrera-Hinojosa
ExaGRyPE describes a suite of solvers and solver ingredients for numerical relativity that are based upon ExaHyPE 2, the second generation of our Exascale Hyperbolic PDE Engine. Numerical relativity simulations are crucial in resolving astrophysical phenomena in strong gravitational fields and are fundamental in analyzing and understanding gravitational wave emissions. The presented generation of ExaGRyPE solves the Einstein field equations in the standard CCZ4 formulation under a 3+1 foliation and focuses on black hole space-times. It employs a block-structured Cartesian grid carrying a higher-order Finite Difference scheme with full support of adaptive mesh refinement (AMR), it facilitates massive parallelism combining message passing, domain decomposition and task parallelism, and it supports the injection of particles into the grid as static data probes or as moving tracers. We introduce the ExaGRyPE-specific building blocks within ExaHyPE 2, and discuss its software architecture and compute-n-feel.
For this, we formalize the creation of any specific astrophysical simulation with ExaGRyPE as a sequence of lowering operations, where a few abstract logical tasks are successively broken down into finer and finer tasks until we obtain an abstraction level which can directly be mapped onto a C++ executable. The overall program logic is fully specified via a domain-specific Python interface, we automatically map this logic onto a more detailed set of numerical tasks, subsequently lower this representation onto technical tasks that the underlying ExaHyPE engine uses to parallelize the application, before eventually the technical tasks in turn are mapped onto small task graphs including the actual astrophysical PDE term evaluations, initial conditions, boundary conditions, and so forth. These can be injected manually by the user, or users might instruct the solver on the most abstract user interface level to use out-of-the-box ExaGRyPE implementations. We end up with a rigorous separation of concerns which shields ExaGRyPE users from technical details and hence simplifies the development of novel physical models. We present the simulations and data for the gauge wave, static single black holes and rotating binary black hole systems, demonstrating that the code base is mature and usable. However, we also uncover domain-specific numerical challenges that need further study by the community in future work.
{"title":"ExaGRyPE: Numerical general relativity solvers based upon the hyperbolic PDEs solver engine ExaHyPE","authors":"Han Zhang , Baojiu Li , Tobias Weinzierl , Cristian Barrera-Hinojosa","doi":"10.1016/j.cpc.2024.109435","DOIUrl":"10.1016/j.cpc.2024.109435","url":null,"abstract":"<div><div>ExaGRyPE describes a suite of solvers and solver ingredients for numerical relativity that are based upon ExaHyPE 2, the second generation of our Exascale Hyperbolic PDE Engine. Numerical relativity simulations are crucial in resolving astrophysical phenomena in strong gravitational fields and are fundamental in analyzing and understanding gravitational wave emissions. The presented generation of ExaGRyPE solves the Einstein field equations in the standard CCZ4 formulation under a 3+1 foliation and focuses on black hole space-times. It employs a block-structured Cartesian grid carrying a higher-order Finite Difference scheme with full support of adaptive mesh refinement (AMR), it facilitates massive parallelism combining message passing, domain decomposition and task parallelism, and it supports the injection of particles into the grid as static data probes or as moving tracers. We introduce the ExaGRyPE-specific building blocks within ExaHyPE 2, and discuss its software architecture and compute-n-feel.</div><div>For this, we formalize the creation of any specific astrophysical simulation with ExaGRyPE as a sequence of lowering operations, where a few abstract logical tasks are successively broken down into finer and finer tasks until we obtain an abstraction level which can directly be mapped onto a C++ executable. The overall program logic is fully specified via a domain-specific Python interface, we automatically map this logic onto a more detailed set of numerical tasks, subsequently lower this representation onto technical tasks that the underlying ExaHyPE engine uses to parallelize the application, before eventually the technical tasks in turn are mapped onto small task graphs including the actual astrophysical PDE term evaluations, initial conditions, boundary conditions, and so forth. These can be injected manually by the user, or users might instruct the solver on the most abstract user interface level to use out-of-the-box ExaGRyPE implementations. We end up with a rigorous separation of concerns which shields ExaGRyPE users from technical details and hence simplifies the development of novel physical models. We present the simulations and data for the gauge wave, static single black holes and rotating binary black hole systems, demonstrating that the code base is mature and usable. However, we also uncover domain-specific numerical challenges that need further study by the community in future work.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109435"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Self-heating can significantly degrade the performance in silicon nanoscale devices. In this work, the impact of self-heating is investigated in nanosheet transistors made of two-dimensional materials using ab-initio techniques. A new algorithm was developed to allow for efficient self-energy computations, achieving a ∼500 times speedup. It is found that for the simple case of free-standing transition-metal dicalchogenides without explicit metal leads, electron-phonon scattering with room-temperature phonons dominates the device performance. For MoS2, the effect of self-heating is negligible in comparison. For WS2 and especially for WSe2, self-heating effects demonstrate a further degradation of the ON-state current.
{"title":"Fully coupled electron-phonon transport in two-dimensional-material-based devices using efficient FFT-based self-energy calculations","authors":"Rutger Duflou , Gautam Gaddemane , Michel Houssa , Aryan Afzalian","doi":"10.1016/j.cpc.2024.109430","DOIUrl":"10.1016/j.cpc.2024.109430","url":null,"abstract":"<div><div>Self-heating can significantly degrade the performance in silicon nanoscale devices. In this work, the impact of self-heating is investigated in nanosheet transistors made of two-dimensional materials using ab-initio techniques. A new algorithm was developed to allow for efficient self-energy computations, achieving a ∼500 times speedup. It is found that for the simple case of free-standing transition-metal dicalchogenides without explicit metal leads, electron-phonon scattering with room-temperature phonons dominates the device performance. For MoS<sub>2</sub>, the effect of self-heating is negligible in comparison. For WS<sub>2</sub> and especially for WSe<sub>2</sub>, self-heating effects demonstrate a further degradation of the ON-state current.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109430"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.cpc.2024.109426
A. Braz, L.G.S. Duarte, H.S. Ferreira, A.C.S. Guabiraba, L.A.C.P. da Mota, I.S.S. Nascimento
Finding first integrals of second-order nonlinear ordinary differential equations (nonlinear 2ODEs) is a very difficult task. In very complicated cases, where we cannot find Darboux polynomials (to construct an integrating factor) or a Lie symmetry (that allows us to simplify the equations), we sometimes can solve the problem by using a nonlocal symmetry. In [1], [2], [3] we developed (and improved) a method (S-function method) that is successful in finding nonlocal Lie symmetries to a large class of nonlinear rational 2ODEs. However, even with the nonlocal symmetry, we still need to solve a 1ODE (which can be very difficult to solve) to find the first integral. In this work we present a novel way of using the nonlocal symmetry to compute the first integral with a very efficient linear procedure.
New version program summary
Program Title: InSyDE – Invariants and Symmetries of (rational second order ordinary) Differential Equations.
CPC Library link to program files:https://doi.org/10.17632/4ytft6zgk7.3
Licensing provisions: CC by NC 3.0
Programming language: Maple
Supplementary material: Theoretical results and revision of the S-function method.
Journal reference of previous version: Comput. Phys. Comm. Volume 234, January 2019, Pages 302-314 - https://doi.org/10.1016/j.cpc.2018.05.009
Does the new version supersede the previous version?: Yes.
Nature of problem: Determining first integrals of rational second order ordinary differential equations.
Solution method: The method is explained in the Summary of revisions and Supplementary material.
Reasons for the new version: The InSyDE package after determining the S-function still needs to solve a first-order ordinary differential equation (1ODE) associated with the nonlocal symmetry (the so-called associated 1ODE – see [2]). The problem is that, for very complicated 1ODEs, this may not be practically feasible. We have developed an new and more efficient method that uses the nonlocal symmetry to (for a large class of 1ODEs) determine the first integral in a linear way.
Summary of revisions: In order to implement the new method just mentioned above we have made modifications to the command (Sfunction) and introduced a new one: command (Darlin).
寻找二阶非线性常微分方程(非线性 2ODEs )的初等积分是一项非常困难的任务。在非常复杂的情况下,如果我们找不到达布多项式(用于构造积分因子)或 Lie 对称性(允许我们简化方程),有时我们可以利用非局部对称性来解决问题。在 [1]、[2]、[3] 中,我们开发(并改进)了一种方法(S 函数法),成功地为一大类非线性有理 2ODE 找到了非局部 Lie 对称性。然而,即使有了非局部对称性,我们仍然需要求解 1ODE (可能非常难以求解)以找到第一积分。在这项工作中,我们提出了一种利用非局部对称性的新方法,通过非常高效的线性过程计算第一积分:InSyDE - Invariants and Symmetries of (rational second order ordinary) Differential Equations.CPC Library link to program files: https://doi.org/10.17632/4ytft6zgk7.3Licensing provisions:CC by NC 3.0编程语言:Maple 补充材料:Theoretical results and revision of the S-function method.Journal reference of previous version:Comput.Phys.第 234 卷,2019 年 1 月,第 302-314 页 - https://doi.org/10.1016/j.cpc.2018.05.009Does 新版本是否取代旧版本?是.问题性质:确定有理二阶常微分方程的第一次积分.求解方法:问题性质:求有理二阶常微分方程的初等积分:InSyDE 软件包在确定 S 函数后仍需要求解与非局部对称性相关的一阶常微分方程(1ODE)(即所谓的相关 1ODE - 参见 [2])。问题是,对于非常复杂的 1ODE 而言,这在实践中可能并不可行。我们开发了一种新的、更有效的方法,利用非局部对称性(对于一大类 1ODEs 来说)以线性方式确定第一积分:为了实现上述新方法,我们修改了指令 (Sfunction),并引入了一个新指令:指令 (Darlin)。
{"title":"A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations","authors":"A. Braz, L.G.S. Duarte, H.S. Ferreira, A.C.S. Guabiraba, L.A.C.P. da Mota, I.S.S. Nascimento","doi":"10.1016/j.cpc.2024.109426","DOIUrl":"10.1016/j.cpc.2024.109426","url":null,"abstract":"<div><div>Finding first integrals of second-order nonlinear ordinary differential equations (nonlinear 2ODEs) is a very difficult task. In very complicated cases, where we cannot find Darboux polynomials (to construct an integrating factor) or a Lie symmetry (that allows us to simplify the equations), we sometimes can solve the problem by using a nonlocal symmetry. In <span><span>[1]</span></span>, <span><span>[2]</span></span>, <span><span>[3]</span></span> we developed (and improved) a method (S-function method) that is successful in finding nonlocal Lie symmetries to a large class of nonlinear rational 2ODEs. However, even with the nonlocal symmetry, we still need to solve a 1ODE (which can be very difficult to solve) to find the first integral. In this work we present a novel way of using the nonlocal symmetry to compute the first integral with a very efficient linear procedure.</div></div><div><h3>New version program summary</h3><div><em>Program Title: InSyDE</em> – Invariants and Symmetries of (rational second order ordinary) Differential Equations.</div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/4ytft6zgk7.3</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> CC by NC 3.0</div><div><em>Programming language:</em> Maple</div><div><em>Supplementary material:</em> Theoretical results and revision of the S-function method.</div><div><em>Journal reference of previous version:</em> Comput. Phys. Comm. Volume 234, January 2019, Pages 302-314 - <span><span>https://doi.org/10.1016/j.cpc.2018.05.009</span><svg><path></path></svg></span></div><div><em>Does the new version supersede the previous version?:</em> Yes.</div><div><em>Nature of problem:</em> Determining first integrals of rational second order ordinary differential equations.</div><div><em>Solution method:</em> The method is explained in the Summary of revisions and Supplementary material.</div><div><em>Reasons for the new version:</em> The <em>InSyDE</em> package after determining the S-function still needs to solve a first-order ordinary differential equation (1ODE) associated with the nonlocal symmetry (the so-called associated 1ODE – see <span><span>[2]</span></span>). The problem is that, for very complicated 1ODEs, this may not be practically feasible. We have developed an new and more efficient method that uses the nonlocal symmetry to (for a large class of 1ODEs) determine the first integral in a linear way.</div><div><em>Summary of revisions:</em> In order to implement the new method just mentioned above we have made modifications to the command (<span>Sfunction</span>) and introduced a new one: command (<span>Darlin</span>).</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109426"},"PeriodicalIF":7.2,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.cpc.2024.109429
Zhentong Wang, Oskar J. Haidn, Xiangyu Hu
The finite volume method (FVM) is widely recognized as a computationally efficient and accurate mesh-based technique. However, it has notable limitations, particularly in mesh generation and handling complex boundary interfaces or conditions. In contrast, the smoothed particle hydrodynamics (SPH) method, a popular meshless alternative, inherently circumvents the challenges of mesh generation and yields smoother numerical outcomes. Nevertheless, this approach comes at the cost of reduced computational efficiency. Consequently, researchers have strategically combined the strengths of both methods to investigate complex flow phenomena, producing precise and computationally efficient outcomes. However, algorithms involving the weak coupling of these two methods tend to be intricate and face challenges regarding versatility, implementation, and mutual adaptation to hardware and coding structures. Thus, achieving a robust and strong coupling of FVM and SPH within a unified framework is essential. A mesh-based FVM has recently been integrated into the SPH-based library SPHinXsys. However, due to the differing boundary algorithms between these methods, the crucial step for establishing a strong coupling of both methods within a unified SPH framework is to incorporate the FVM boundary algorithm into the Eulerian SPH method. In this paper, we propose a straightforward algorithm within the Eulerian SPH method, which is algorithmically equivalent to that in FVM and based on the principle of zero-order consistency. Moreover, several numerical examples, including compressible and incompressible flows with various boundary conditions in the Eulerian SPH method, demonstrate the stability and accuracy of the proposed algorithm.
{"title":"An algorithm for the incorporation of relevant FVM boundary conditions in the Eulerian SPH framework","authors":"Zhentong Wang, Oskar J. Haidn, Xiangyu Hu","doi":"10.1016/j.cpc.2024.109429","DOIUrl":"10.1016/j.cpc.2024.109429","url":null,"abstract":"<div><div>The finite volume method (FVM) is widely recognized as a computationally efficient and accurate mesh-based technique. However, it has notable limitations, particularly in mesh generation and handling complex boundary interfaces or conditions. In contrast, the smoothed particle hydrodynamics (SPH) method, a popular meshless alternative, inherently circumvents the challenges of mesh generation and yields smoother numerical outcomes. Nevertheless, this approach comes at the cost of reduced computational efficiency. Consequently, researchers have strategically combined the strengths of both methods to investigate complex flow phenomena, producing precise and computationally efficient outcomes. However, algorithms involving the weak coupling of these two methods tend to be intricate and face challenges regarding versatility, implementation, and mutual adaptation to hardware and coding structures. Thus, achieving a robust and strong coupling of FVM and SPH within a unified framework is essential. A mesh-based FVM has recently been integrated into the SPH-based library SPHinXsys. However, due to the differing boundary algorithms between these methods, the crucial step for establishing a strong coupling of both methods within a unified SPH framework is to incorporate the FVM boundary algorithm into the Eulerian SPH method. In this paper, we propose a straightforward algorithm within the Eulerian SPH method, which is algorithmically equivalent to that in FVM and based on the principle of zero-order consistency. Moreover, several numerical examples, including compressible and incompressible flows with various boundary conditions in the Eulerian SPH method, demonstrate the stability and accuracy of the proposed algorithm.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109429"},"PeriodicalIF":7.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational resources have experienced exponential growth in the last decades enabling the simulation of complex physical problems at the cost of a massive increase in data storage. This is especially true for N-body simulations now reaching billions or trillions particles in certain cases. To overcome the drawbacks of data storage on disk for post-processing purposes, on-the-fly analysis has gained momentum but still represents a challenge in both its implementation and efficiency without impacting the simulation engine performances. This work provides a new in-situ procedure for features detection in massive N-body simulations, leveraging state-of-the-art techniques from various fields. Based on a discrete-to-continuum paradigm shift, particles and their respective physical quantities are projected onto a 3D regular grid before applying image analysis algorithms to group voxels based on specific user-defined criteria. A significant extension to the hybrid parallelism of connected component analysis within the image processing community is also introduced in the present study. Traditionally operating in shared memory parallelism, this extension now incorporates both distributed and shared memory approaches. The implementation is carried out within the exaStamp classical Molecular Dynamics code, a variant of the open-source exaNBody platform [39]. This adaptation allows for the on-the-fly analysis of multi-billion atoms samples with at most a 1.3% overhead. In addition, the entire framework is benchmarked up to 32768 cores. The applicability of the present approach is demonstrated on the case of a spall fracture in a tantalum sample as well as high velocity impact of a tin droplets on a rigid surface.
{"title":"On-the-fly clustering for exascale molecular dynamics simulations","authors":"Killian Babilotte , Alizée Dubois , Thierry Carrard , Paul Lafourcade , Laurent Videau , Jean-François Molinari , Laurent Soulard","doi":"10.1016/j.cpc.2024.109427","DOIUrl":"10.1016/j.cpc.2024.109427","url":null,"abstract":"<div><div>Computational resources have experienced exponential growth in the last decades enabling the simulation of complex physical problems at the cost of a massive increase in data storage. This is especially true for N-body simulations now reaching billions or trillions particles in certain cases. To overcome the drawbacks of data storage on disk for post-processing purposes, <em>on-the-fly</em> analysis has gained momentum but still represents a challenge in both its implementation and efficiency without impacting the simulation engine performances. This work provides a new <em>in-situ</em> procedure for features detection in massive N-body simulations, leveraging state-of-the-art techniques from various fields. Based on a <em>discrete-to-continuum</em> paradigm shift, particles and their respective physical quantities are projected onto a 3D regular grid before applying image analysis algorithms to group voxels based on specific user-defined criteria. A significant extension to the hybrid parallelism of connected component analysis within the image processing community is also introduced in the present study. Traditionally operating in shared memory parallelism, this extension now incorporates both distributed and shared memory approaches. The implementation is carried out within the exaStamp classical Molecular Dynamics code, a variant of the open-source exaNBody platform <span><span>[39]</span></span>. This adaptation allows for the <em>on-the-fly</em> analysis of multi-billion atoms samples with at most a 1.3% overhead. In addition, the entire framework is benchmarked up to 32768 cores. The applicability of the present approach is demonstrated on the case of a spall fracture in a tantalum sample as well as high velocity impact of a tin droplets on a rigid surface.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109427"},"PeriodicalIF":7.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}