Pub Date : 2024-10-15DOI: 10.1016/j.cpc.2024.109400
Jian Fang , Sylvain Laizet , Alex Skillen
This paper presents the development and validation of a Magnetohydrodynamics (MHD) module integrated into the Xcompact3d framework, an open-source high-order finite-difference suite of solvers designed to study turbulent flows on supercomputers. Leveraging the Fast Fourier Transform library already implemented in Xcompact3d, alongside sixth-order compact finite-difference schemes and a direct spectral Poisson solver, both the induction and potential-based MHD equations can be efficiently solved at scale on CPU-based supercomputers for fluids with strong and weak magnetic field, respectively. Validation of the MHD solver is conducted against established benchmarks, including Orszag-Tang vortex and MHD channel flows, demonstrating the module's capability to accurately capture complex MHD phenomena, providing a powerful tool for research in both engineering and astrophysics. The scalability of the Xcompact3d framework remains intact with the incorporation of the MHD module, ensuring efficient performance on modern high-performance clusters. This paper also presents new findings on the evolution of the Taylor-Green vortex under an external magnetic field for different flow regimes.
{"title":"A high-order finite-difference solver for direct numerical simulations of magnetohydrodynamic turbulence","authors":"Jian Fang , Sylvain Laizet , Alex Skillen","doi":"10.1016/j.cpc.2024.109400","DOIUrl":"10.1016/j.cpc.2024.109400","url":null,"abstract":"<div><div>This paper presents the development and validation of a Magnetohydrodynamics (MHD) module integrated into the Xcompact3d framework, an open-source high-order finite-difference suite of solvers designed to study turbulent flows on supercomputers. Leveraging the Fast Fourier Transform library already implemented in Xcompact3d, alongside sixth-order compact finite-difference schemes and a direct spectral Poisson solver, both the induction and potential-based MHD equations can be efficiently solved at scale on CPU-based supercomputers for fluids with strong and weak magnetic field, respectively. Validation of the MHD solver is conducted against established benchmarks, including Orszag-Tang vortex and MHD channel flows, demonstrating the module's capability to accurately capture complex MHD phenomena, providing a powerful tool for research in both engineering and astrophysics. The scalability of the Xcompact3d framework remains intact with the incorporation of the MHD module, ensuring efficient performance on modern high-performance clusters. This paper also presents new findings on the evolution of the Taylor-Green vortex under an external magnetic field for different flow regimes.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109400"},"PeriodicalIF":7.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, an optimization of shielding structures with different geometries is established for the D-T neutron generator system by combining Back Propagation (BP) neural network and Analytic Hierarchy Process (AHP). The D-T neutron generator (Model NG-9) used in the system was developed independently by Northeast Normal University. After investigating the rule of shielding performance among spherical, cylindrical and cubic geometries, the spherical shield is selected for BP neural network prediction to determine the total dose rate through it. Information about spherical multilayer-shielding structures and properties calculated by MCNP code is used to train the neural network. The predicted result serves as a parameter of the evaluation function, which provides a comprehensive assessment of the dose rate penetrated the shield, the shielding mass, and the shielding volume. Together with AHP, the weight factors are determined for all the optimization objectives to construct the evaluation function. By comparing its values, the optimal shielding structures for spherical, cylindrical and cubic materials are found. Against MCNP simulated values, the total dose rates’ errors of the optimal shielding structures for the sphere, cylinder, and cube are 1.72 %, -4.94 %, and -5.17 %, respectively. This result demonstrates that the combination of BP neural network and AHP is more effective in addressing multi-objective optimization problems related to the design of radiation shielding for various geometries.
{"title":"A method for optimizing different geometric shields of D-T neutron generators by combining BP neural network and Analytic Hierarchy Process","authors":"Jiayu Li, Shiwei Jing, Jingfei Cai, Hailong Xu, Pingwei Sun, Yingying Cao, Shangrui Jiang, Shaolei Jia, Zhaohu Lu, Guanghao Li","doi":"10.1016/j.cpc.2024.109397","DOIUrl":"10.1016/j.cpc.2024.109397","url":null,"abstract":"<div><div>In this paper, an optimization of shielding structures with different geometries is established for the D-T neutron generator system by combining Back Propagation (BP) neural network and Analytic Hierarchy Process (AHP). The D-T neutron generator (Model NG-9) used in the system was developed independently by Northeast Normal University. After investigating the rule of shielding performance among spherical, cylindrical and cubic geometries, the spherical shield is selected for BP neural network prediction to determine the total dose rate through it. Information about spherical multilayer-shielding structures and properties calculated by MCNP code is used to train the neural network. The predicted result serves as a parameter of the evaluation function, which provides a comprehensive assessment of the dose rate penetrated the shield, the shielding mass, and the shielding volume. Together with AHP, the weight factors are determined for all the optimization objectives to construct the evaluation function. By comparing its values, the optimal shielding structures for spherical, cylindrical and cubic materials are found. Against MCNP simulated values, the total dose rates’ errors of the optimal shielding structures for the sphere, cylinder, and cube are 1.72 %, -4.94 %, and -5.17 %, respectively. This result demonstrates that the combination of BP neural network and AHP is more effective in addressing multi-objective optimization problems related to the design of radiation shielding for various geometries.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109397"},"PeriodicalIF":7.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-09DOI: 10.1016/j.cpc.2024.109392
Jing-Ze Li, Ming-Liang Zhao, Yu-Ru Zhang, Fei Gao, You-Nian Wang
Fluid simulations are widely used in optimizing the reactor geometry and improving the performance of capacitively coupled plasma (CCP) sources in industry, so high computation speed is very important. In this work, a fast method for CCP fluid simulation based on the framework of Multi-physics Analysis of Plasma Sources (MAPS) is developed, which includes a multi-time-step explicit upwind scheme to solve electron fluid equations, a semi-implicit scheme and an iterative method with in-phase initial value to solve Poisson's equation, an explicit upwind scheme with limited artificial diffusion to solve heavy particle fluid equations, and an acceleration method based on fluid equation modification to reduce the periods required to reach equilibrium. In order to prove the validity and efficiency of the newly developed method, benchmarking against COMSOL and comparison with experimental data have been performed in argon discharges on the Gaseous Electronics Conference (GEC) reactor. Besides, the performance of each acceleration method is tested, and the results indicated that the multi-time-step explicit Euler scheme can effectively decline the computational burden in the bulk plasma and reduce the time cost on the electron fluid equations by half. The in-phase initial value method can greatly decrease the iteration times required to solve linear equations and lower the computational time of Poisson's equation by 77 %. The acceleration method based on equation modification can reduce the periods required to reach equilibrium by two-thirds.
{"title":"Fast simulation strategy for capacitively-coupled plasmas based on fluid model","authors":"Jing-Ze Li, Ming-Liang Zhao, Yu-Ru Zhang, Fei Gao, You-Nian Wang","doi":"10.1016/j.cpc.2024.109392","DOIUrl":"10.1016/j.cpc.2024.109392","url":null,"abstract":"<div><div>Fluid simulations are widely used in optimizing the reactor geometry and improving the performance of capacitively coupled plasma (CCP) sources in industry, so high computation speed is very important. In this work, a fast method for CCP fluid simulation based on the framework of Multi-physics Analysis of Plasma Sources (MAPS) is developed, which includes a multi-time-step explicit upwind scheme to solve electron fluid equations, a semi-implicit scheme and an iterative method with in-phase initial value to solve Poisson's equation, an explicit upwind scheme with limited artificial diffusion to solve heavy particle fluid equations, and an acceleration method based on fluid equation modification to reduce the periods required to reach equilibrium. In order to prove the validity and efficiency of the newly developed method, benchmarking against COMSOL and comparison with experimental data have been performed in argon discharges on the Gaseous Electronics Conference (GEC) reactor. Besides, the performance of each acceleration method is tested, and the results indicated that the multi-time-step explicit Euler scheme can effectively decline the computational burden in the bulk plasma and reduce the time cost on the electron fluid equations by half. The in-phase initial value method can greatly decrease the iteration times required to solve linear equations and lower the computational time of Poisson's equation by 77 %. The acceleration method based on equation modification can reduce the periods required to reach equilibrium by two-thirds.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109392"},"PeriodicalIF":7.2,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-09DOI: 10.1016/j.cpc.2024.109393
Daniel Carne, Joseph Peoples, Ziqi Guo, Dudong Feng, Zherui Han, Xiaojie Liu, Xiulin Ruan
FOS, which means light in Greek, is an open-source program for Fast Optical Spectrum calculations of nanoparticle media. This program takes the material properties and a description of the system as input, and outputs the spectral response including the reflectance, absorptance, and transmittance. Previous open-source codes often include only one portion of what is needed to calculate the spectral response of a nanoparticulate medium, such as Mie theory or a Monte Carlo method. FOS is designed to provide a convenient fully integrated format to remove the barrier as well as providing a significantly accelerated implementation with compiled Python code, parallel processing, and pre-trained machine learning predictions. This program can accelerate optimization and high throughput design of optical properties of nanoparticle or nanocomposite media, such as radiative cooling paint and solar heating liquids, allowing for the discovery of new materials and designs. FOS also enables convenient modeling of lunar dust coatings, combustion particulates, and many other particulate systems. In this paper we discuss the methodology used in FOS, features of the program, and provide four case studies.
{"title":"FOS: A fully integrated open-source program for Fast Optical Spectrum calculations of nanoparticle media","authors":"Daniel Carne, Joseph Peoples, Ziqi Guo, Dudong Feng, Zherui Han, Xiaojie Liu, Xiulin Ruan","doi":"10.1016/j.cpc.2024.109393","DOIUrl":"10.1016/j.cpc.2024.109393","url":null,"abstract":"<div><div>FOS, which means light in Greek, is an open-source program for Fast Optical Spectrum calculations of nanoparticle media. This program takes the material properties and a description of the system as input, and outputs the spectral response including the reflectance, absorptance, and transmittance. Previous open-source codes often include only one portion of what is needed to calculate the spectral response of a nanoparticulate medium, such as Mie theory or a Monte Carlo method. FOS is designed to provide a convenient fully integrated format to remove the barrier as well as providing a significantly accelerated implementation with compiled Python code, parallel processing, and pre-trained machine learning predictions. This program can accelerate optimization and high throughput design of optical properties of nanoparticle or nanocomposite media, such as radiative cooling paint and solar heating liquids, allowing for the discovery of new materials and designs. FOS also enables convenient modeling of lunar dust coatings, combustion particulates, and many other particulate systems. In this paper we discuss the methodology used in FOS, features of the program, and provide four case studies.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"307 ","pages":"Article 109393"},"PeriodicalIF":7.2,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.cpc.2024.109389
Yi Zhan , Min Luo , Abbas Khayyer
<div><div>This paper presents an enhanced version of the well-known SPH (Smoothed Particle Hydrodynamics) open-source code DualSPHysics for the simulation of free-surface fluid flows, leading to the DualSPHysics+ code. The enhancements are made through incorporation of several schemes with respect to stability, accuracy and energy/volume conservation issues in simulating incompressible free-surface fluid flows within the weakly compressible SPH formalism. The Optimized Particle Shifting (OPS) scheme is implemented to improve the accuracy of particle shifting vectors in the free-surface region. To mitigate energy dissipation and maintain consistency, the artificial viscosity in <em>δ</em>-SPH is substituted with a Riemann stabilization term, leading to the <em>δ</em>R-SPH. The Velocity divergence Error Mitigating (VEM) and Volume Conservation Shifting (VCS) schemes are adopted in DualSPHysics+ to mitigate the velocity divergence error and improve the volume conservation, and hence to enhance the resolution of the continuity equation. To further reduce both the instantaneous and accumulated errors in velocity divergence, a Hyperbolic/Parabolic Divergence Cleaning (HPDC) scheme is incorporated in addition to the VEM scheme. The implementations of the introduced schemes on both CPU and GPU-based versions of the DualSPHysics+ code along with details on the compilation, running and computational performance are presented. Validations in terms of accuracy, energy conservation and convergence of DualSPHysics+ are shown via several relevant benchmarks. It is demonstrated that a better velocity divergence error cleaning in both instantaneous and accumulated errors can be achieved by the combination of VEM and HPDC. Meanwhile, the excessive energy dissipation by the artificial viscosity is shown to be suppressed by adopting the Riemann stabilization term. Enhanced resolution of the continuity equation along with improved energy conservation of DualSPHysics+ advance the SPH-based simulation of incompressible free-surface fluid flows.</div></div><div><h3>Program Summary</h3><div><em>Program title:</em> DualSPHysics+.</div><div><em>CPC Library link to program files</em> <span><span>https://doi.org/10.17632/xnrfv9pgb5.1</span><svg><path></path></svg></span>.</div><div><em>Licensing provisions:</em> GNU Lesser General Public License (LGPL).</div><div><em>Programming language:</em> C++, CUDA.</div><div><em>External dependencies:</em> DualSPHysics (<span><span>https://dual.sphysics.org</span><svg><path></path></svg></span>).</div><div><em>Nature of problem:</em> Weakly Compressible Smoothed Particle Hydrodynamics (WCSPH) method and the open-source code DualSPHysics have been widely applied to simulate free-surface fluid flows. Both the general WCSPH method and the more specific DualSPHysics need further improvements in several aspects, including spurious pressure fluctuations, non-conservation of volume and excessive energy dissipation, to enhance the accuracy and
{"title":"DualSPHysics+: An enhanced DualSPHysics with improvements in accuracy, energy conservation and resolution of the continuity equation","authors":"Yi Zhan , Min Luo , Abbas Khayyer","doi":"10.1016/j.cpc.2024.109389","DOIUrl":"10.1016/j.cpc.2024.109389","url":null,"abstract":"<div><div>This paper presents an enhanced version of the well-known SPH (Smoothed Particle Hydrodynamics) open-source code DualSPHysics for the simulation of free-surface fluid flows, leading to the DualSPHysics+ code. The enhancements are made through incorporation of several schemes with respect to stability, accuracy and energy/volume conservation issues in simulating incompressible free-surface fluid flows within the weakly compressible SPH formalism. The Optimized Particle Shifting (OPS) scheme is implemented to improve the accuracy of particle shifting vectors in the free-surface region. To mitigate energy dissipation and maintain consistency, the artificial viscosity in <em>δ</em>-SPH is substituted with a Riemann stabilization term, leading to the <em>δ</em>R-SPH. The Velocity divergence Error Mitigating (VEM) and Volume Conservation Shifting (VCS) schemes are adopted in DualSPHysics+ to mitigate the velocity divergence error and improve the volume conservation, and hence to enhance the resolution of the continuity equation. To further reduce both the instantaneous and accumulated errors in velocity divergence, a Hyperbolic/Parabolic Divergence Cleaning (HPDC) scheme is incorporated in addition to the VEM scheme. The implementations of the introduced schemes on both CPU and GPU-based versions of the DualSPHysics+ code along with details on the compilation, running and computational performance are presented. Validations in terms of accuracy, energy conservation and convergence of DualSPHysics+ are shown via several relevant benchmarks. It is demonstrated that a better velocity divergence error cleaning in both instantaneous and accumulated errors can be achieved by the combination of VEM and HPDC. Meanwhile, the excessive energy dissipation by the artificial viscosity is shown to be suppressed by adopting the Riemann stabilization term. Enhanced resolution of the continuity equation along with improved energy conservation of DualSPHysics+ advance the SPH-based simulation of incompressible free-surface fluid flows.</div></div><div><h3>Program Summary</h3><div><em>Program title:</em> DualSPHysics+.</div><div><em>CPC Library link to program files</em> <span><span>https://doi.org/10.17632/xnrfv9pgb5.1</span><svg><path></path></svg></span>.</div><div><em>Licensing provisions:</em> GNU Lesser General Public License (LGPL).</div><div><em>Programming language:</em> C++, CUDA.</div><div><em>External dependencies:</em> DualSPHysics (<span><span>https://dual.sphysics.org</span><svg><path></path></svg></span>).</div><div><em>Nature of problem:</em> Weakly Compressible Smoothed Particle Hydrodynamics (WCSPH) method and the open-source code DualSPHysics have been widely applied to simulate free-surface fluid flows. Both the general WCSPH method and the more specific DualSPHysics need further improvements in several aspects, including spurious pressure fluctuations, non-conservation of volume and excessive energy dissipation, to enhance the accuracy and","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109389"},"PeriodicalIF":7.2,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1016/j.cpc.2024.109390
Zili Chen , Zhaoyu Chen , Yu Wang , Jingwen Xu , Zhipeng Chen , Wei Jiang , Hongyu Wang , Ya Zhang
The non-uniform grids in the axisymmetric coordinate system pose a significant challenge for electrostatic particle-in-cell/Monte Carlo collision (PIC/MCC) simulations because they require numerous macroparticles to manage numerical heating around the mid-axis. To address this, we have developed a stochastic weighted particle control method that selectively samples small-weight particles, effectively controlling the particle number without inducing numerical heating. This method is based on a rejection-acceptance probability merging scheme, which is easy to implement and has a low time complexity. We have also made essential modifications, including a corrected density deposition scheme, an energy conservation scheme, and the introduction of target weights. By applying this particle control method, the number of macroparticles in the simulation can be reduced by more than one order of magnitude, significantly reducing the required computing time and storage. Furthermore, appropriately setting target weights also enables enhanced resolution of dilute regions with an acceptable increase in computational cost.
{"title":"Stochastic weighted particle control for electrostatic particle-in-cell Monte Carlo collision simulations in an axisymmetric coordinate system","authors":"Zili Chen , Zhaoyu Chen , Yu Wang , Jingwen Xu , Zhipeng Chen , Wei Jiang , Hongyu Wang , Ya Zhang","doi":"10.1016/j.cpc.2024.109390","DOIUrl":"10.1016/j.cpc.2024.109390","url":null,"abstract":"<div><div>The non-uniform grids in the axisymmetric coordinate system pose a significant challenge for electrostatic particle-in-cell/Monte Carlo collision (PIC/MCC) simulations because they require numerous macroparticles to manage numerical heating around the mid-axis. To address this, we have developed a stochastic weighted particle control method that selectively samples small-weight particles, effectively controlling the particle number without inducing numerical heating. This method is based on a rejection-acceptance probability merging scheme, which is easy to implement and has a low time complexity. We have also made essential modifications, including a corrected density deposition scheme, an energy conservation scheme, and the introduction of target weights. By applying this particle control method, the number of macroparticles in the simulation can be reduced by more than one order of magnitude, significantly reducing the required computing time and storage. Furthermore, appropriately setting target weights also enables enhanced resolution of dilute regions with an acceptable increase in computational cost.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109390"},"PeriodicalIF":7.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142423278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.cpc.2024.109388
Marius Kurz , Daniel Kempf , Marcel P. Blind , Patrick Kopper , Philipp Offenhäuser , Anna Schwarz , Spencer Starr , Jens Keim , Andrea Beck
This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured hexahedral meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. The scheme is implemented using a pure distributed memory parallelization based on a domain decomposition, where each GPU handles a distinct region of the computational domain. On each GPU, the computations are assigned to different compute streams which allows to antedate the computation of quantities required for communication while performing local computations from other streams to hide the communication latency. This parallelization strategy allows for maximizing the use of available computational resources. This results in excellent strong scaling properties of GALÆXI up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor–Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALÆXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALÆXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALÆXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.
本研究利用现代图形处理器(GPU)的并行计算能力,将 GALÆXI 作为一种新颖、节能的流动求解器,用于模拟非结构化六面体网格上的可压缩流动。GALÆXI 利用冲击捕捉和有限体积子单元方法实现了高阶非连续伽勒金谱元法(DGSEM),以确保高阶方案在冲击附近的稳定性。本研究详细介绍了一般代码设计、并行化策略和计算内核的实现方法,重点是非结构网格导致的体积和表面数据之间的元素局部映射。该方案采用基于域分解的纯分布式内存并行化技术,每个 GPU 处理计算域的一个不同区域。在每个 GPU 上,计算被分配到不同的计算流中,这样就可以延迟计算通信所需的数量,同时执行来自其他计算流的本地计算,以隐藏通信延迟。这种并行化策略可以最大限度地利用可用计算资源。如果为每个 GPU 分配至少一百万个自由度,GALÆXI 的扩展性能将达到 1024 个 GPU。为了验证其实施效果,进行了收敛性研究,恢复了所实施数值方案的理论收敛阶次。此外,在马赫数分别为 0.1 和 1.25 的条件下,使用泰勒-格林-涡流的不可压缩和可压缩公式对求解器进行了验证。网格收敛研究表明,结果收敛于高保真参考解,并且结果与最初的 CPU 实现相匹配。最后,GALÆXI 被应用于 NASA 37 号转子线性级联的大规模壁面分辨大涡模拟。在这里,冲击捕获方法准确而稳健地捕获了超音速区域和前缘冲击。结果表明,与参考的 CPU 实现相比,GALÆXI 执行该模拟所需的能量不到一半。这使得GALÆXI成为在超大规模计算和相关新型HPC架构下精确高效模拟可压缩流的有力工具。
{"title":"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems","authors":"Marius Kurz , Daniel Kempf , Marcel P. Blind , Patrick Kopper , Philipp Offenhäuser , Anna Schwarz , Spencer Starr , Jens Keim , Andrea Beck","doi":"10.1016/j.cpc.2024.109388","DOIUrl":"10.1016/j.cpc.2024.109388","url":null,"abstract":"<div><div>This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured hexahedral meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. The scheme is implemented using a pure distributed memory parallelization based on a domain decomposition, where each GPU handles a distinct region of the computational domain. On each GPU, the computations are assigned to different compute streams which allows to antedate the computation of quantities required for communication while performing local computations from other streams to hide the communication latency. This parallelization strategy allows for maximizing the use of available computational resources. This results in excellent strong scaling properties of GALÆXI up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor–Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALÆXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALÆXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALÆXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109388"},"PeriodicalIF":7.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142423280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.cpc.2024.109377
Xing-Long Lyu , Tiexiang Li , Wen-Wei Lin
In this study, we introduce an efficient method for determining isolated singular points of two-dimensional semi-infinite and bi-infinite photonic crystals, equipped with perfect electric conductor and quasi-periodic mixed boundary conditions. This specific problem can be modeled by a Helmholtz equation and is recast as a generalized eigenvalue problem involving an infinite-dimensional block quasi-Toeplitz matrix. Through an intelligent implementation of cyclic structure-preserving matrix transformations, the contour integral method is elegantly employed to calculate the isolated eigenvalue and to extract a component of the associated eigenvector. Moreover, a propagation formula for electromagnetic fields is derived. This formulation enables rapid computation of field distributions across the expansive semi-infinite and bi-infinite domains, thus highlighting the attributes of edge states. The preliminary MATLAB implementation is available at https://github.com/FAME-GPU/2D_Semi-infinite_PhC.
{"title":"A contour integral-based method for nonlinear eigenvalue problems for semi-infinite photonic crystals","authors":"Xing-Long Lyu , Tiexiang Li , Wen-Wei Lin","doi":"10.1016/j.cpc.2024.109377","DOIUrl":"10.1016/j.cpc.2024.109377","url":null,"abstract":"<div><div>In this study, we introduce an efficient method for determining isolated singular points of two-dimensional semi-infinite and bi-infinite photonic crystals, equipped with perfect electric conductor and quasi-periodic mixed boundary conditions. This specific problem can be modeled by a Helmholtz equation and is recast as a generalized eigenvalue problem involving an infinite-dimensional block quasi-Toeplitz matrix. Through an intelligent implementation of cyclic structure-preserving matrix transformations, the contour integral method is elegantly employed to calculate the isolated eigenvalue and to extract a component of the associated eigenvector. Moreover, a propagation formula for electromagnetic fields is derived. This formulation enables rapid computation of field distributions across the expansive semi-infinite and bi-infinite domains, thus highlighting the attributes of edge states. The preliminary MATLAB implementation is available at <span><span>https://github.com/FAME-GPU/2D_Semi-infinite_PhC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109377"},"PeriodicalIF":7.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142423277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.cpc.2024.109387
A. Bjelčić, N. Schunck
We describe a new algorithm to calculate the vibrational nuclear level density of an atomic nucleus. Fictitious perturbation operators that probe the response of the system are generated by drawing their matrix elements from some probability distribution function. We use the Finite Amplitude Method to explicitly compute the response for each such sample. With the help of the Kernel Polynomial Method, we build an estimator of the vibrational level density and provide the upper bound of the relative error in the limit of infinitely many random samples. The new algorithm can give accurate estimates of the vibrational level density. Since it is based on drawing multiple samples of perturbation operators, its computational implementation is naturally parallel and scales like the number of available processing units.
{"title":"Computing the QRPA level density with the finite amplitude method","authors":"A. Bjelčić, N. Schunck","doi":"10.1016/j.cpc.2024.109387","DOIUrl":"10.1016/j.cpc.2024.109387","url":null,"abstract":"<div><div>We describe a new algorithm to calculate the vibrational nuclear level density of an atomic nucleus. Fictitious perturbation operators that probe the response of the system are generated by drawing their matrix elements from some probability distribution function. We use the Finite Amplitude Method to explicitly compute the response for each such sample. With the help of the Kernel Polynomial Method, we build an estimator of the vibrational level density and provide the upper bound of the relative error in the limit of infinitely many random samples. The new algorithm can give accurate estimates of the vibrational level density. Since it is based on drawing multiple samples of perturbation operators, its computational implementation is naturally parallel and scales like the number of available processing units.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109387"},"PeriodicalIF":7.2,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-20DOI: 10.1016/j.cpc.2024.109383
Kevin Lee , Anne Stratman , Clark Casarella , Ani Aprahamian , Shelly Lesher
Measurements of level lifetimes and the extracted transition probabilities are one of the cornerstones of nuclear structure physics. The reduced transition probabilities, yield information about the structure, wavefunctions, and matrix elements of excited states connected by electromagnetic transitions in a given nucleus. The arsenal of techniques for measuring lifetimes continues to expand and presently includes a wide range of values from femtoseconds to microseconds. While lifetime measurement techniques vary, the extraction of transition probabilities remains the same. RULER is the program used by the National Nuclear Data Center (NNDC) and ENDSF evaluations, while TRANSNUCLEAR was developed at the University of Cologne and modified by a variety of groups. This paper presents a new program TROPIC (TRansitiOn ProbabIlity Calculator), which is the most modern and efficient way to extract transition probabilities . TROPIC is a program written in Python 3 with the NumPy and SciPy libraries. This is in line with the advances that ENSDF and NNDC are making in moving away from the 80-character card punch input formats. Several design features were implemented to provide a streamlined process for the user and mitigate drawbacks that were present in other programs. The results from TROPIC have been compared with TRANSNUCLEAR and RULER. The answers are as expected identical, but the investment of input to output time is significantly reduced. TROPIC will be made available for public domain use, along with a user guide and example files.
Program summary
Program Title: TROPIC
CPC Library link to program files:https://doi.org/10.17632/958ygp2sb4.1
Nature of problem: An efficient way to calculate multiple reduced transition probabilities with minimal effort invested from the user.
Solution method: A Python 3 script has been developed to read in a CSV file containing all necessary input parameters, calculate the transition probabilities listed in the CSV file, and export the results in three different output formats.
{"title":"TROPIC: A program for calculating reduced transition probabilities","authors":"Kevin Lee , Anne Stratman , Clark Casarella , Ani Aprahamian , Shelly Lesher","doi":"10.1016/j.cpc.2024.109383","DOIUrl":"10.1016/j.cpc.2024.109383","url":null,"abstract":"<div><div>Measurements of level lifetimes and the extracted transition probabilities are one of the cornerstones of nuclear structure physics. The reduced transition probabilities, <span><math><mi>B</mi><mo>(</mo><mi>π</mi><mi>λ</mi><mo>;</mo><msub><mrow><mi>J</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>→</mo><msub><mrow><mi>J</mi></mrow><mrow><mi>f</mi></mrow></msub><mo>)</mo></math></span> yield information about the structure, wavefunctions, and matrix elements of excited states connected by electromagnetic transitions in a given nucleus. The arsenal of techniques for measuring lifetimes continues to expand and presently includes a wide range of values from femtoseconds to microseconds. While lifetime measurement techniques vary, the extraction of transition probabilities remains the same. RULER is the program used by the National Nuclear Data Center (NNDC) and ENDSF evaluations, while TRANSNUCLEAR was developed at the University of Cologne and modified by a variety of groups. This paper presents a new program TROPIC (<em>TR</em>ansiti<em>O</em>n <em>P</em>robab<em>I</em>lity <em>C</em>alculator), which is the most modern and efficient way to extract transition probabilities <span><math><mi>B</mi><mo>(</mo><mi>π</mi><mi>λ</mi><mo>)</mo></math></span>. TROPIC is a program written in Python 3 with the NumPy and SciPy libraries. This is in line with the advances that ENSDF and NNDC are making in moving away from the 80-character card punch input formats. Several design features were implemented to provide a streamlined process for the user and mitigate drawbacks that were present in other programs. The results from TROPIC have been compared with TRANSNUCLEAR and RULER. The answers are as expected identical, but the investment of input to output time is significantly reduced. TROPIC will be made available for public domain use, along with a user guide and example files.</div></div><div><h3>Program summary</h3><div><em>Program Title:</em> TROPIC</div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/958ygp2sb4.1</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/ND-fIREBall/TROPIC</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> GPLv3</div><div><em>Programming language:</em> Python 3</div><div><em>Nature of problem:</em> An efficient way to calculate multiple reduced transition probabilities with minimal effort invested from the user.</div><div><em>Solution method:</em> A Python 3 script has been developed to read in a CSV file containing all necessary input parameters, calculate the transition probabilities listed in the CSV file, and export the results in three different output formats.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109383"},"PeriodicalIF":7.2,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}