An efficient parallelized discrete particle model for dense gas-solid flows on unstructured mesh

TeraGrid Conference Pub Date : 2011-07-18 DOI:10.1145/2016741.2016752

C. L. Wu, K. Nandakumar

{"title":"An efficient parallelized discrete particle model for dense gas-solid flows on unstructured mesh","authors":"C. L. Wu, K. Nandakumar","doi":"10.1145/2016741.2016752","DOIUrl":null,"url":null,"abstract":"An efficient, parallelized implementation of discrete particle/element model (DPM or DEM) coupled with the computational fluid dynamics (CFD) model has been developed. Two parallelization strategies are used to partly overcome the poor load balancing problem due to the heterogeneous particle distribution in space. Firstly at the coarse-grained level, the solution domain is decomposed into partitions using bisection algorithm to minimize the number of faces at the partition boundaries while keeping almost equal number of cells in each partition. The solution of the gas-phase governing equations is performed on these partitions. Particles and the solution of their dynamics are associated with partitions according to their hosting cells. This makes no data exchange between processors when calculating the hydrodynamic forces on particles. By introducing proper data mapping between partitions, the cell void fraction is calculated accurately even if a particle is shared by several partitions. Neighboring partitions are grouped by a gross evaluation before simulation, with each group having similar particle number. The computation task of a group of partitions is assigned to a compute node, which has multi-cores or multiprocessors with a shared memory. Each core or processor in a node takes the computation of the gas governing equations in one partition. Processors communicate and exchange data through Message Passing Interface (MPI) at this coarse-grained parallelism. Secondly, the multithreading technique is used to parallelize the computation of the dynamics of the particles in each partition. The number of compute threads is determined according to the number of particles in partitions and the number of cores in a compute node. In such a way there is almost no waiting of the threads in a compute node. Since the particle numbers in all compute nodes are almost the same, the above strategy yields an efficient load balancing among compute nodes. Test numerical experiments on TeraGrid HPC cluster Queen Bee show that the developed code is efficient and scalable to simulate dense gas-solid flows with up to more than 10 millions of particles by 128 compute nodes. Bubbling in a middle-scale fluidized bed and granular Rayleigh-Taylor instability are well captured by the parallel code.","PeriodicalId":257555,"journal":{"name":"TeraGrid Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TeraGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016741.2016752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

An efficient, parallelized implementation of discrete particle/element model (DPM or DEM) coupled with the computational fluid dynamics (CFD) model has been developed. Two parallelization strategies are used to partly overcome the poor load balancing problem due to the heterogeneous particle distribution in space. Firstly at the coarse-grained level, the solution domain is decomposed into partitions using bisection algorithm to minimize the number of faces at the partition boundaries while keeping almost equal number of cells in each partition. The solution of the gas-phase governing equations is performed on these partitions. Particles and the solution of their dynamics are associated with partitions according to their hosting cells. This makes no data exchange between processors when calculating the hydrodynamic forces on particles. By introducing proper data mapping between partitions, the cell void fraction is calculated accurately even if a particle is shared by several partitions. Neighboring partitions are grouped by a gross evaluation before simulation, with each group having similar particle number. The computation task of a group of partitions is assigned to a compute node, which has multi-cores or multiprocessors with a shared memory. Each core or processor in a node takes the computation of the gas governing equations in one partition. Processors communicate and exchange data through Message Passing Interface (MPI) at this coarse-grained parallelism. Secondly, the multithreading technique is used to parallelize the computation of the dynamics of the particles in each partition. The number of compute threads is determined according to the number of particles in partitions and the number of cores in a compute node. In such a way there is almost no waiting of the threads in a compute node. Since the particle numbers in all compute nodes are almost the same, the above strategy yields an efficient load balancing among compute nodes. Test numerical experiments on TeraGrid HPC cluster Queen Bee show that the developed code is efficient and scalable to simulate dense gas-solid flows with up to more than 10 millions of particles by 128 compute nodes. Bubbling in a middle-scale fluidized bed and granular Rayleigh-Taylor instability are well captured by the parallel code.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非结构网格上致密气固流动的高效并行离散粒子模型

本文提出了一种高效、并行的离散粒子/单元模型(DPM或DEM)与计算流体力学(CFD)模型耦合的实现方法。采用两种并行化策略，在一定程度上克服了粒子空间分布不均匀导致的负载不均衡问题。首先，在粗粒度层面，采用等分算法将解域分解为多个分区，使分区边界处的面数最少，同时每个分区的单元数几乎相等;在这些分区上求解气相控制方程。粒子及其动力学解与它们的寄主细胞的分区有关。这使得在计算粒子上的流体动力时处理器之间没有数据交换。通过在分区之间引入适当的数据映射，即使一个粒子被几个分区共享，也能准确地计算出细胞的空隙率。模拟前对相邻分区进行总体评价分组，每组的粒子数相近。将一组分区的计算任务分配给具有共享内存的多核或多处理器的计算节点。节点中的每个核心或处理器在一个分区中计算气体控制方程。处理器在这种粗粒度并行性下通过消息传递接口(Message Passing Interface, MPI)进行通信和交换数据。其次，采用多线程技术并行计算各分区中粒子的动力学。计算线程数由分区中的粒子数和计算节点的核数决定。通过这种方式，计算节点中的线程几乎不需要等待。由于所有计算节点中的粒子数几乎相同，因此上述策略在计算节点之间产生了有效的负载平衡。在TeraGrid高性能计算集群蜂王上进行的测试数值实验表明，所开发的代码具有高效和可扩展性，可以通过128个计算节点模拟多达1000万个粒子的密集气固流动。并行程序较好地捕捉了中等规模流化床中的冒泡现象和颗粒状的瑞利-泰勒不稳定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

TeraGrid Conference

自引率

0.00%

发文量

期刊最新文献

Autotuned parallel I/O for highly scalable biosequence analysis A European framework to build science gateways: architecture and use cases Using the TeraGrid to teach scientific computing A scalable multi-scale framework for parallel simulation and visualization of microbial evolution Coming to consensus on competencies for petascale computing education and training