A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes

IF 6.9 1区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Computer Methods in Applied Mechanics and Engineering Pub Date : 2025-01-10 DOI:10.1016/j.cma.2024.117723
Shayan Mohammadian, Ankit S. Kumar, Chongmin Song
{"title":"A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes","authors":"Shayan Mohammadian,&nbsp;Ankit S. Kumar,&nbsp;Chongmin Song","doi":"10.1016/j.cma.2024.117723","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.</div></div>","PeriodicalId":55222,"journal":{"name":"Computer Methods in Applied Mechanics and Engineering","volume":"436 ","pages":"Article 117723"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Applied Mechanics and Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045782524009794","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多gpu的八叉树网格弹性动力学仿真高性能计算框架
本文提出了一种利用图形处理器单元(gpu)进行大规模弹性动力学分析的高性能计算框架。本研究采用八叉树算法自动生成网格。八叉树网格采用缩放边界有限元法(SBFEM),消除了不同大小的八叉树单元之间的挂节点。该方法利用平衡八叉树网格中有限的主单元,大大降低了计算成本和内存需求,有利于GPU计算。并行化是通过网格划分技术和消息传递接口(MPI)指令实现的,辅以NVIDIA集体通信库(NCCL),以实现高性能计算(HPC)设施中gpu之间的最佳点对点通信。HPC框架实现了显式和隐式动态分析。隐式分析中,采用预条件共轭梯度法求解方程。给出了数值实例来验证该实现,并演示了GPU实现的能力。一个基于图像的3D模型代表了月球复杂表面的一部分,模拟了一个由大约4.4亿个自由度组成的分层结构。使用显式求解器,在并行配置8个NVIDIA A100 gpu的单个计算节点上实现了865的加速。利用隐式求解器对一个包含约6100万个自由度的三维虚拟城市进行建模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
12.70
自引率
15.30%
发文量
719
审稿时长
44 days
期刊介绍: Computer Methods in Applied Mechanics and Engineering stands as a cornerstone in the realm of computational science and engineering. With a history spanning over five decades, the journal has been a key platform for disseminating papers on advanced mathematical modeling and numerical solutions. Interdisciplinary in nature, these contributions encompass mechanics, mathematics, computer science, and various scientific disciplines. The journal welcomes a broad range of computational methods addressing the simulation, analysis, and design of complex physical problems, making it a vital resource for researchers in the field.
期刊最新文献
A bioinspired multi-layer assembly method for mechanical metamaterials with extreme properties using topology optimization Simultaneous shape and topology optimization on unstructured grids Self-support structure topology optimization for multi-axis additive manufacturing incorporated with curved layer slicing Robust equilibrium optimization method for dynamic characteristics of mechanical structures with hybrid uncertainties Global-local adaptive meshing method for phase-field fracture modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1