Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916299
Sayan Ghosh, M. Halappanavar, Antonino Tumeo, A. Kalyanaraman
Real-world graphs exhibit structures known as “communities” or “clusters” consisting of a group of vertices with relatively high connectivity between them, as compared to the rest of the vertices in the network. Graph clustering or community detection is a fundamental graph operation used to analyze real-world graphs occurring in the areas of computational biology, cybersecurity, electrical grids, etc. Similar to other graph algorithms, owing to irregular memory accesses and inherently sequential nature, current algorithms for community detection are challenging to parallelize. However, in order to analyze large networks, it is important to develop scalable parallel implementations of graph clustering that are capable of exploiting the architectural features of modern supercomputers.In response to the 2019 Streaming Graph Challenge, we present quality and performance analysis of our distributed-memory community detection using Vite, which is our distributed memory implementation of the popular Louvain method, on the ALCF Theta supercomputer.Clustering methods such as Louvain that rely on modularity maximization are known to suffer from the resolution limit problem, preventing identification of clusters of certain sizes. Hence, we also include quality analysis of our shared-memory implementation of the Fast-tracking Resistance method, in comparison with Louvain on the challenge datasets.Furthermore, we introduce an edge-balanced graph distribution for our distributed memory implementation, that significantly reduces communication, offering up to 80% improvement in the overall execution time. In addition to performance/quality analysis, we also include details on the power/energy consumption, and memory traffic of the distributed-memory clustering implementation using real-world graphs with over a billion edges.
{"title":"Scaling and Quality of Modularity Optimization Methods for Graph Clustering","authors":"Sayan Ghosh, M. Halappanavar, Antonino Tumeo, A. Kalyanaraman","doi":"10.1109/HPEC.2019.8916299","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916299","url":null,"abstract":"Real-world graphs exhibit structures known as “communities” or “clusters” consisting of a group of vertices with relatively high connectivity between them, as compared to the rest of the vertices in the network. Graph clustering or community detection is a fundamental graph operation used to analyze real-world graphs occurring in the areas of computational biology, cybersecurity, electrical grids, etc. Similar to other graph algorithms, owing to irregular memory accesses and inherently sequential nature, current algorithms for community detection are challenging to parallelize. However, in order to analyze large networks, it is important to develop scalable parallel implementations of graph clustering that are capable of exploiting the architectural features of modern supercomputers.In response to the 2019 Streaming Graph Challenge, we present quality and performance analysis of our distributed-memory community detection using Vite, which is our distributed memory implementation of the popular Louvain method, on the ALCF Theta supercomputer.Clustering methods such as Louvain that rely on modularity maximization are known to suffer from the resolution limit problem, preventing identification of clusters of certain sizes. Hence, we also include quality analysis of our shared-memory implementation of the Fast-tracking Resistance method, in comparison with Louvain on the challenge datasets.Furthermore, we introduce an edge-balanced graph distribution for our distributed memory implementation, that significantly reduces communication, offering up to 80% improvement in the overall execution time. In addition to performance/quality analysis, we also include details on the power/energy consumption, and memory traffic of the distributed-memory clustering implementation using real-world graphs with over a billion edges.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124288977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916231
Mohamed W. Hassan, S. Pakin, Wu-chun Feng
A quantum annealer solves optimization problems by exploiting quantum effects. Problems are represented as Hamiltonian functions that define an energy landscape. The quantum-annealing hardware relaxes to a solution corresponding to the ground state of the energy landscape. Expressing arbitrary programming problems in terms of real-valued Hamiltonian-function coefficients is unintuitive and challenging. This paper addresses the difficulty of programming quantum annealers by presenting a compilation framework that compiles a subset of C code to a quantum machine instruction (QMI) to be executed on a quantum annealer. Our work is based on a modular software stack that facilitates programming D-Wave quantum annealers by successively lowering code from C to Verilog to a symbolic “quantum macro assembly language” and finally to a device-specific Hamiltonian function. We demonstrate the capabilities of our software stack on a set of problems written in C and executed on a D-Wave 2000Q quantum annealer.
{"title":"C to D-Wave: A High-level C Compilation Framework for Quantum Annealers","authors":"Mohamed W. Hassan, S. Pakin, Wu-chun Feng","doi":"10.1109/HPEC.2019.8916231","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916231","url":null,"abstract":"A quantum annealer solves optimization problems by exploiting quantum effects. Problems are represented as Hamiltonian functions that define an energy landscape. The quantum-annealing hardware relaxes to a solution corresponding to the ground state of the energy landscape. Expressing arbitrary programming problems in terms of real-valued Hamiltonian-function coefficients is unintuitive and challenging. This paper addresses the difficulty of programming quantum annealers by presenting a compilation framework that compiles a subset of C code to a quantum machine instruction (QMI) to be executed on a quantum annealer. Our work is based on a modular software stack that facilitates programming D-Wave quantum annealers by successively lowering code from C to Verilog to a symbolic “quantum macro assembly language” and finally to a device-specific Hamiltonian function. We demonstrate the capabilities of our software stack on a set of problems written in C and executed on a D-Wave 2000Q quantum annealer.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123432144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916526
C. Bobda, Taylor J. L. Whitaker, Joel Mandebi Mbongue, S. Saha
In this work, we propose a high-level synthesis approach for hardware sandboxes in system-on-chip. Using interface formalism to capture interactions between non-trusted IPs and trusted parts of a system on chip, along with the properties specification language to specify non-authorized actions of non-trusted IPs, sandboxes are generated and made ready for inclusion as IP in a system-on-chip design. The concepts of composition, compatibility, and refinement are used to capture illegal actions and optimize resources across the boundary of single IPs. We have designed a tool that automatically generates the sandbox and facilitates their integration into system-on-chip. Our approach was validated with benchmarks from trust-hub.com and FPGA implementations. All our results showed 100% Trojan detection and mitigation, with only a minimal increase in resource overhead and no performance decrease.
{"title":"Synthesis of Hardware Sandboxes for Trojan Mitigation in Systems on Chip","authors":"C. Bobda, Taylor J. L. Whitaker, Joel Mandebi Mbongue, S. Saha","doi":"10.1109/HPEC.2019.8916526","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916526","url":null,"abstract":"In this work, we propose a high-level synthesis approach for hardware sandboxes in system-on-chip. Using interface formalism to capture interactions between non-trusted IPs and trusted parts of a system on chip, along with the properties specification language to specify non-authorized actions of non-trusted IPs, sandboxes are generated and made ready for inclusion as IP in a system-on-chip design. The concepts of composition, compatibility, and refinement are used to capture illegal actions and optimize resources across the boundary of single IPs. We have designed a tool that automatically generates the sandbox and facilitates their integration into system-on-chip. Our approach was validated with benchmarks from trust-hub.com and FPGA implementations. All our results showed 100% Trojan detection and mitigation, with only a minimal increase in resource overhead and no performance decrease.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916506
Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li
Deep neural networks have revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to fast store and train them. Over the past few decades, researches have explored the prospect of sparse DNNs before, during, and after training by pruning edges from the underlying topology. After the above operation, the generated neural network is known as a sparse neural network. More recent works have demonstrated the remarkable results that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. Although existing methods ease the situation that high demand for computation resources severely hinders the deployment of large-scale DNNs in resource-constrained devices, DNNs can be trained at a faster speed and lower cost. In this work, we propose a Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of DNNs. FSSL can: (1) learn a compact structure from large sparse DNN to reduce computation cost; (2) obtain a hardware-friendly to accelerate the DNNs evaluation efficiently. Experimental results of the training time and the compression rate show that superior performance and efficiency than the Matlab example code. These speedups are about twice speedups of non-structured sparsity.
{"title":"Performance of Training Sparse Deep Neural Networks on GPUs","authors":"Jianzong Wang, Zhangcheng Huang, Lingwei Kong, Jing Xiao, Pengyu Wang, Lu Zhang, Chao Li","doi":"10.1109/HPEC.2019.8916506","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916506","url":null,"abstract":"Deep neural networks have revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to fast store and train them. Over the past few decades, researches have explored the prospect of sparse DNNs before, during, and after training by pruning edges from the underlying topology. After the above operation, the generated neural network is known as a sparse neural network. More recent works have demonstrated the remarkable results that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. Although existing methods ease the situation that high demand for computation resources severely hinders the deployment of large-scale DNNs in resource-constrained devices, DNNs can be trained at a faster speed and lower cost. In this work, we propose a Fine-tune Structured Sparsity Learning (FSSL) method to regularize the structures of DNNs and accelerate the training of DNNs. FSSL can: (1) learn a compact structure from large sparse DNN to reduce computation cost; (2) obtain a hardware-friendly to accelerate the DNNs evaluation efficiently. Experimental results of the training time and the compression rate show that superior performance and efficiency than the Matlab example code. These speedups are about twice speedups of non-structured sparsity.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121641754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916285
M. Almasri, Omer Anjum, Carl Pearson, Zaid Qureshi, Vikram Sharma Mailthody, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu
In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single k k-truss implementation, we propose multiple algorithmic optimizations that significantly improve performance by up to 35.2x (6.9x on average) compared to our previous GPU implementation. In addition, we present a scalable multi-GPU implementation in which each GPU handles a different ‘k’ value. Compared to our prior multi-GPU implementation, the proposed approach is faster by up to 151.3x (78.8x on average). In case when the edges with only maximal k-truss are sought, incrementing the ‘k’ value in each iteration is inefficient particularly for graphs with large maximum k-truss. Thus, we propose binary search for the ‘k’ value to find the maximal k-truss. The binary search approach on a single GPU is up to 101.5 (24.3x on average) faster than our 2018 k-truss submission. Lastly, we show that the proposed binary search finds the maximum k-truss for “Twitter“ graph dataset having 2.8 billion bidirectional edges in just 16 minutes on a single V100 GPU.
在本文中,我们对之前提交的2018年图挑战k-桁架分解进行了更新。对于单个k- k-truss实现,我们提出了多个算法优化,与之前的GPU实现相比,显著提高了高达35.2倍(平均6.9倍)的性能。此外,我们提出了一个可扩展的多GPU实现,其中每个GPU处理不同的“k”值。与我们之前的多gpu实现相比,所提出的方法的速度高达151.3倍(平均78.8倍)。在只寻找最大k-truss的边的情况下,在每次迭代中增加k值是低效的,特别是对于具有最大k-truss的图。因此,我们提出二分搜索' k '值,以找到最大的k桁架。在单个GPU上的二进制搜索方法比我们2018年提交的k-truss快101.5(平均24.3倍)。最后,我们证明了所提出的二叉搜索在单个V100 GPU上只需16分钟即可找到具有28亿个双向边的“Twitter”图数据集的最大k-truss。
{"title":"Update on k-truss Decomposition on GPU","authors":"M. Almasri, Omer Anjum, Carl Pearson, Zaid Qureshi, Vikram Sharma Mailthody, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu","doi":"10.1109/HPEC.2019.8916285","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916285","url":null,"abstract":"In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single k k-truss implementation, we propose multiple algorithmic optimizations that significantly improve performance by up to 35.2x (6.9x on average) compared to our previous GPU implementation. In addition, we present a scalable multi-GPU implementation in which each GPU handles a different ‘k’ value. Compared to our prior multi-GPU implementation, the proposed approach is faster by up to 151.3x (78.8x on average). In case when the edges with only maximal k-truss are sought, incrementing the ‘k’ value in each iteration is inefficient particularly for graphs with large maximum k-truss. Thus, we propose binary search for the ‘k’ value to find the maximal k-truss. The binary search approach on a single GPU is up to 101.5 (24.3x on average) faster than our 2018 k-truss submission. Lastly, we show that the proposed binary search finds the maximum k-truss for “Twitter“ graph dataset having 2.8 billion bidirectional edges in just 16 minutes on a single V100 GPU.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916332
Leonard Kosta, John Irvine, Laura Seaman, H. Xi
Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets.
{"title":"Many-target, Many-sensor Ship Tracking and Classification","authors":"Leonard Kosta, John Irvine, Laura Seaman, H. Xi","doi":"10.1109/HPEC.2019.8916332","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916332","url":null,"abstract":"Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122510712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916486
Alan Ehret, K. Gettings, B. R. Jordan, M. Kinsy
In this work, we survey hardware-based security techniques applicable to low-power system-on-chip designs. Techniques related to a system’s processing elements, volatile main memory and caches, non-volatile memory and on-chip interconnects are examined. Threat models for each subsystem and technique are considered. Performance overheads and other trade-offs for each technique are discussed. Defenses with similar threat models are compared.
{"title":"A Survey on Hardware Security Techniques Targeting Low-Power SoC Designs","authors":"Alan Ehret, K. Gettings, B. R. Jordan, M. Kinsy","doi":"10.1109/HPEC.2019.8916486","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916486","url":null,"abstract":"In this work, we survey hardware-based security techniques applicable to low-power system-on-chip designs. Techniques related to a system’s processing elements, volatile main memory and caches, non-volatile memory and on-chip interconnects are examined. Threat models for each subsystem and technique are considered. Performance overheads and other trade-offs for each technique are discussed. Defenses with similar threat models are compared.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122896317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916378
J. Ellis, S. Rajamanickam
Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.
{"title":"Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels","authors":"J. Ellis, S. Rajamanickam","doi":"10.1109/HPEC.2019.8916378","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916378","url":null,"abstract":"Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115584051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/HPEC.2019.8916467
V. Kotteda, Anitha Kommu, Vinod Kumar
We carried out a nondeterministic analysis of flow in a fluidized bed. The flow in the fluidized bed is simulated with National Energy Technology Laboratory’s open-source multiphase fluid dynamics suite MFiX. It does not possess tools for uncertainty quantification. Therefore, we developed a C++ wrapper to integrate an uncertainty quantification toolkit developed at Sandia National Laboratory with MFiX. The wrapper exchanges uncertain input parameters and key output parameters among Dakota and MFiX. However, a data-driven framework is also developed to obtain reliable statistics as it is not feasible to get them with MFiX integrated into Dakota, Dakota-MFiX. The data generated from Dakota-MFiX simulations, with the Latin Hypercube method of sampling size 500, is used to train a machine-learning algorithm. The trained and tested deep neural network algorithm is integrated with Dakota via the wrapper to obtain low order statistics of the bed height and pressure drop across the bed.
{"title":"A data-driven framework for uncertainty quantification of a fluidized bed","authors":"V. Kotteda, Anitha Kommu, Vinod Kumar","doi":"10.1109/HPEC.2019.8916467","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916467","url":null,"abstract":"We carried out a nondeterministic analysis of flow in a fluidized bed. The flow in the fluidized bed is simulated with National Energy Technology Laboratory’s open-source multiphase fluid dynamics suite MFiX. It does not possess tools for uncertainty quantification. Therefore, we developed a C++ wrapper to integrate an uncertainty quantification toolkit developed at Sandia National Laboratory with MFiX. The wrapper exchanges uncertain input parameters and key output parameters among Dakota and MFiX. However, a data-driven framework is also developed to obtain reliable statistics as it is not feasible to get them with MFiX integrated into Dakota, Dakota-MFiX. The data generated from Dakota-MFiX simulations, with the Latin Hypercube method of sampling size 500, is used to train a machine-learning algorithm. The trained and tested deep neural network algorithm is integrated with Dakota via the wrapper to obtain low order statistics of the bed height and pressure drop across the bed.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126909919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}