Combustion process is at the root of most energy production systems. The understanding of combustion is fundamental to exploit efficiently the available natural resources and to reduce pollutant emissions. The giant leaps performed in computer science over the past two decades render possible the use of computer simulation to better understand combustion in real industrial configurations. This presentation discusses and illustrates the application of high performance computing for Computational Fluid Dynamics (CFD). Specific attention is addressed to the Large Eddy Simulation (LES) approach for industrial energy production configurations: ranging from aeronautical gas turbine engines including helicopters and commercial airliners, piston engines and stationary gas turbine engines used in large scale electricity production systems.
{"title":"High performance computing for combustion applications","authors":"G. Staffelbach","doi":"10.1145/1188455.1188514","DOIUrl":"https://doi.org/10.1145/1188455.1188514","url":null,"abstract":"Combustion process is at the root of most energy production systems. The understanding of combustion is fundamental to exploit efficiently the available natural resources and to reduce pollutant emissions. The giant leaps performed in computer science over the past two decades render possible the use of computer simulation to better understand combustion in real industrial configurations. This presentation discusses and illustrates the application of high performance computing for Computational Fluid Dynamics (CFD). Specific attention is addressed to the Large Eddy Simulation (LES) approach for industrial energy production configurations: ranging from aeronautical gas turbine engines including helicopters and commercial airliners, piston engines and stationary gas turbine engines used in large scale electricity production systems.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Innovative cabling solutions will be a key factor in realizing next generation supercomputing clusters. Demands for higher data rates, larger clusters, and increased density cannot be optimally addressed with existing twin-axial cabling solutions. Quellan, Inc.'s family of low power, low latency Lane Manager ICs provide a 2x reach extension over standard cable for single lane data rates up to 6.25 Gb/s. In addition, the Lane Managers can facilitate increased density and improved airflow through clusters by enabling narrow gauge cables to operate at maximum lengths comparable to that of standard 24AWG cabling. Integrated higher layer features ensure compliance with a variety of current and emerging standards such as Infiniband, PCI Express, and CX-4. This presentation will highlight the performance and advanced features set of the Lane Manager family while also detailing the benefits of this technology for addressing various signal integrity challenges inherent to the cabling infrastructure of supercomputing clusters.
{"title":"Enabling next generation supercomputing clusters","authors":"M. Vrazel","doi":"10.1145/1188455.1188732","DOIUrl":"https://doi.org/10.1145/1188455.1188732","url":null,"abstract":"Innovative cabling solutions will be a key factor in realizing next generation supercomputing clusters. Demands for higher data rates, larger clusters, and increased density cannot be optimally addressed with existing twin-axial cabling solutions. Quellan, Inc.'s family of low power, low latency Lane Manager ICs provide a 2x reach extension over standard cable for single lane data rates up to 6.25 Gb/s. In addition, the Lane Managers can facilitate increased density and improved airflow through clusters by enabling narrow gauge cables to operate at maximum lengths comparable to that of standard 24AWG cabling. Integrated higher layer features ensure compliance with a variety of current and emerging standards such as Infiniband, PCI Express, and CX-4. This presentation will highlight the performance and advanced features set of the Lane Manager family while also detailing the benefits of this technology for addressing various signal integrity challenges inherent to the cabling infrastructure of supercomputing clusters.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Twenty five years ago supercomputing was dominated by vector processors and emergent SIMD array processors clocked at tens of Megahertz. Today responding to dramatic advances in semiconductor device fabrication technologies, the world of supercomputing is dominated by multi-core based MPP and commodity cluster systems clocked at Gigahertz. Twenty five years in the future, the technology landscape will again have experienced dramatic change with the flat-lining of Moore's Law, the realization of nanoscale devices, and the emergence of potentially alien technologies, architectures, and paradigms. If Moore's Law were to continue to progress as before, we would be deploying systems approaching 100 Exaflops with clock rates nearing a Terahertz. But by then, power constraints, quantum effects, or our inability to exploit trillion way program parallelism may have forced us in to entirely new realms of processing. This presentation will consider the range of alternative technologies, architectures, and methods that may drive the extremes of computing beyond the current incremental steps of the current era.
{"title":"Beyond the beyond and the extremes of computing","authors":"T. Sterling","doi":"10.1145/1188455.1188524","DOIUrl":"https://doi.org/10.1145/1188455.1188524","url":null,"abstract":"Twenty five years ago supercomputing was dominated by vector processors and emergent SIMD array processors clocked at tens of Megahertz. Today responding to dramatic advances in semiconductor device fabrication technologies, the world of supercomputing is dominated by multi-core based MPP and commodity cluster systems clocked at Gigahertz. Twenty five years in the future, the technology landscape will again have experienced dramatic change with the flat-lining of Moore's Law, the realization of nanoscale devices, and the emergence of potentially alien technologies, architectures, and paradigms. If Moore's Law were to continue to progress as before, we would be deploying systems approaching 100 Exaflops with clock rates nearing a Terahertz. But by then, power constraints, quantum effects, or our inability to exploit trillion way program parallelism may have forced us in to entirely new realms of processing. This presentation will consider the range of alternative technologies, architectures, and methods that may drive the extremes of computing beyond the current incremental steps of the current era.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"65 31","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120817912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SRFS on Ether adds an ethernet interface to the Shared Rapid File System (SRFS) that is currently used as a distributed file system between nodes by the HPC-system. It can be used like NFS and has solved the problem of data coherency in the high-speed transmission of data in a broadband environment, which NFS has not. Moreover, adjustment of the TCP/IP parameters in the OS to improve speed is unnecessary, and special hardware is not needed, unlike with the SAN construction by iFCP and others. For additional speed, it stripes data streams automatically (default MAX 8 streams), switches protocols between TCP and UDP based on IOsize.In this bandwidth challenge, we demonstrate security using a host-to-host IPSec connection between Tampa and Tokyo. To show performance, we used a hardware IPSec accelerator and tuned TCP/IP with SRFS on Ether's.
SRFS on Ether为当前hpc系统在节点间使用的分布式文件系统SRFS (Shared Rapid File System)增加了一个以太网接口。它可以像NFS一样使用,并且解决了在宽带环境下高速传输数据时的数据一致性问题,这是NFS所没有的。此外,不需要调整操作系统中的TCP/IP参数来提高速度,也不需要特殊的硬件,这与iFCP等构建SAN不同。为了获得额外的速度,它自动划分数据流(默认最大8个流),基于IOsize在TCP和UDP之间切换协议。在这个带宽挑战中,我们使用坦帕和东京之间的主机对主机IPSec连接来演示安全性。为了展示性能,我们使用了硬件IPSec加速器,并在以太网上使用SRFS调优TCP/IP。
{"title":"Secure file sharing","authors":"N. Fujita, H. Ohkawa","doi":"10.1145/1188455.1188707","DOIUrl":"https://doi.org/10.1145/1188455.1188707","url":null,"abstract":"SRFS on Ether adds an ethernet interface to the Shared Rapid File System (SRFS) that is currently used as a distributed file system between nodes by the HPC-system. It can be used like NFS and has solved the problem of data coherency in the high-speed transmission of data in a broadband environment, which NFS has not. Moreover, adjustment of the TCP/IP parameters in the OS to improve speed is unnecessary, and special hardware is not needed, unlike with the SAN construction by iFCP and others. For additional speed, it stripes data streams automatically (default MAX 8 streams), switches protocols between TCP and UDP based on IOsize.In this bandwidth challenge, we demonstrate security using a host-to-host IPSec connection between Tampa and Tokyo. To show performance, we used a hardware IPSec accelerator and tuned TCP/IP with SRFS on Ether's.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"77 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120825139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I will describe and demonstrate the "Meeting List Tool", a shared application built with the Access Grid Toolkit for use in AG meetings.At present, the two most common ways of sharing text-based information during an AG meeting are shared presentations and the chat tool built into the Venue Client. The former is ideal for static content which is known in advance. The latter is ideal for sharing short pieces of information, such as a URL. What is apparently missing is an application for which data can be prepared in advance, displayed and quickly manipulated during a meeting, and kept at the close of the meeting by all the participants in the collaboration.This application is designed to fill this gap. The current version provides a list of items that can be highlighted, added, deleted, edited, and re-ordered, with the changes being propagated to all instances of the tool.
{"title":"The meeting list tool - a shared application for sharing dynamic information in meetings","authors":"Adam C. Carter","doi":"10.1145/1188455.1188786","DOIUrl":"https://doi.org/10.1145/1188455.1188786","url":null,"abstract":"I will describe and demonstrate the \"Meeting List Tool\", a shared application built with the Access Grid Toolkit for use in AG meetings.At present, the two most common ways of sharing text-based information during an AG meeting are shared presentations and the chat tool built into the Venue Client. The former is ideal for static content which is known in advance. The latter is ideal for sharing short pieces of information, such as a URL. What is apparently missing is an application for which data can be prepared in advance, displayed and quickly manipulated during a meeting, and kept at the close of the meeting by all the participants in the collaboration.This application is designed to fill this gap. The current version provides a list of items that can be highlighted, added, deleted, edited, and re-ordered, with the changes being propagated to all instances of the tool.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"169 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113991445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Slawinska, Dawid Kurzyniec, Jaroslaw Slawinski, V. Sunderam
Shared HPC platforms continue to require substantial effort for software installation and management, often necessitating manual intervention and tedious procedures. We propose a novel model of resource sharing that shifts resource virtualization and aggregation responsibilities to client-side software, thus reducing the burdens on resource providers.The Zero-Force MPI toolkit automates the installation, build, run, and post-processing stages of HPC applications, thus allowing application scientists to focus on using resources instead of managing them. Through a provided console, MPI runtime systems, support libraries, application executables, and needed datafiles can be soft-installed across distributed resources with just a few commands. Built-in data synchronization capabilities simplify common HPC development tasks, saving end-user time and effort. To evaluate ZF-MPI, we conducted experiments with the NAS Parallel Benchmarks. Results demonstrate that the proposed run-not-install approach is effective and may substantially increase overall productivity.
{"title":"Zero-Force MPI: toward tractable toolkits for high performance computing","authors":"M. Slawinska, Dawid Kurzyniec, Jaroslaw Slawinski, V. Sunderam","doi":"10.1145/1188455.1188595","DOIUrl":"https://doi.org/10.1145/1188455.1188595","url":null,"abstract":"Shared HPC platforms continue to require substantial effort for software installation and management, often necessitating manual intervention and tedious procedures. We propose a novel model of resource sharing that shifts resource virtualization and aggregation responsibilities to client-side software, thus reducing the burdens on resource providers.The Zero-Force MPI toolkit automates the installation, build, run, and post-processing stages of HPC applications, thus allowing application scientists to focus on using resources instead of managing them. Through a provided console, MPI runtime systems, support libraries, application executables, and needed datafiles can be soft-installed across distributed resources with just a few commands. Built-in data synchronization capabilities simplify common HPC development tasks, saving end-user time and effort. To evaluate ZF-MPI, we conducted experiments with the NAS Parallel Benchmarks. Results demonstrate that the proposed run-not-install approach is effective and may substantially increase overall productivity.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122354528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
TotalView is a flexible, scriptable parallel debugger with wide acceptance in the High Performance Computing community. This BOF will be an opportunity for TotalView users to share clever and interesting ways of adapting TotalView to their unique environment, using TotalView to do something unusual, or simply making the day to day process of debugging easier. Contact Chris.Gottbrath@etnus.com if you want us to reserve time for you to tell your story or simply show up at the BOF and step forward.
{"title":"TotalView tips and tricks","authors":"C. Gottbrath, P. Thompson","doi":"10.1145/1188455.1188465","DOIUrl":"https://doi.org/10.1145/1188455.1188465","url":null,"abstract":"TotalView is a flexible, scriptable parallel debugger with wide acceptance in the High Performance Computing community. This BOF will be an opportunity for TotalView users to share clever and interesting ways of adapting TotalView to their unique environment, using TotalView to do something unusual, or simply making the day to day process of debugging easier. Contact Chris.Gottbrath@etnus.com if you want us to reserve time for you to tell your story or simply show up at the BOF and step forward.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Belletti, M. Cotallo, A. Flor, L. A. Fernández, A. Gordillo, A. Maiorano, F. Mantovani, E. Marinari, V. Martin-Mayor, A. M. Sudupe, D. Navarro, S. P. Gaviro, M. Rossi, J. Ruiz-Lorenzo, S. Schifano, D. Sciretti, A. Tarancón, R. Tripiccione, J. Velasco
IANUS is a massively parallel system based on a 2D array of FPGA-based processors with nearest-neighbor connections. Processors are also directly connected to a central hub attached to a host computer.The prototype, available in October 2006 uses an array of 4x4 Xilinx Virtex4LX160 FPGA's.We map onto the array the computational kernels of scientific applications characterized by regular control flow, unconventional mix of data-manipulation operations and limited memory usage.Careful VHDL coding of the kernel algorithms relevant for Monte Carlo simulation of spin-glass systems (our first application) yields impressive performances: single processor tests concurrently update ~1000 spins, so average spin-update time is 15 psec. This is ~60 times faster than accurately programmed 3,2 GHz PC's. We plan to build a 256 nodes system, roughly equivalent to 15000 PC's.This poster describes the architecture, the implementation and the methodology with which a specific application is mapped onto the system.
{"title":"IANUS: scientific computing on an FPGA-based architecture","authors":"F. Belletti, M. Cotallo, A. Flor, L. A. Fernández, A. Gordillo, A. Maiorano, F. Mantovani, E. Marinari, V. Martin-Mayor, A. M. Sudupe, D. Navarro, S. P. Gaviro, M. Rossi, J. Ruiz-Lorenzo, S. Schifano, D. Sciretti, A. Tarancón, R. Tripiccione, J. Velasco","doi":"10.1145/1188455.1188633","DOIUrl":"https://doi.org/10.1145/1188455.1188633","url":null,"abstract":"IANUS is a massively parallel system based on a 2D array of FPGA-based processors with nearest-neighbor connections. Processors are also directly connected to a central hub attached to a host computer.The prototype, available in October 2006 uses an array of 4x4 Xilinx Virtex4LX160 FPGA's.We map onto the array the computational kernels of scientific applications characterized by regular control flow, unconventional mix of data-manipulation operations and limited memory usage.Careful VHDL coding of the kernel algorithms relevant for Monte Carlo simulation of spin-glass systems (our first application) yields impressive performances: single processor tests concurrently update ~1000 spins, so average spin-update time is 15 psec. This is ~60 times faster than accurately programmed 3,2 GHz PC's. We plan to build a 256 nodes system, roughly equivalent to 15000 PC's.This poster describes the architecture, the implementation and the methodology with which a specific application is mapped onto the system.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"328 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simulations that model in detail complex interactions in complex environments are now possible, and visualization is a uniquely powerful analysis tool. However, visualization features of simulation codes are typically intended for scientists designing simulations, providing little support for presenting simulation results in a form suitable for non-experts. Conversely, graphics software and hardware progress has been fueled by applications where precisely abiding by the laws of physics is of secondary importance compared to visual realism.This half-day tutorial presents an approach for state-of-the-art visualization of simulation data based on connecting the worlds of computer simulation and computer animation. Concretely, the attendees will learn how to simplify, import, and integrate the simulation data into the surrounding scene, with examples from our simulation of the September 11 Attack on the Pentagon. The resulting realistic visualization enables effective dissemination of simulation results, and helps simulations reach their full potential for high societal impact.
{"title":"Realistic visualization for large-scale simulations","authors":"V. Popescu, C. Hoffmann","doi":"10.1145/1188455.1188688","DOIUrl":"https://doi.org/10.1145/1188455.1188688","url":null,"abstract":"Simulations that model in detail complex interactions in complex environments are now possible, and visualization is a uniquely powerful analysis tool. However, visualization features of simulation codes are typically intended for scientists designing simulations, providing little support for presenting simulation results in a form suitable for non-experts. Conversely, graphics software and hardware progress has been fueled by applications where precisely abiding by the laws of physics is of secondary importance compared to visual realism.This half-day tutorial presents an approach for state-of-the-art visualization of simulation data based on connecting the worlds of computer simulation and computer animation. Concretely, the attendees will learn how to simplify, import, and integrate the simulation data into the surrounding scene, with examples from our simulation of the September 11 Attack on the Pentagon. The resulting realistic visualization enables effective dissemination of simulation results, and helps simulations reach their full potential for high societal impact.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"44 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128527539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Akiba, T. Ohyama, Y. Shibata, Kiyoshi Yuyama, Yoshikazu Katai, R. Takeuchi, T. Hoshino, S. Yoshimura, H. Noguchi, Manish Gupta, John A. Gunnels, V. Austel, Yogish Sabharwal, R. Garg, S. Kato, T. Kawakami, Satoru Todokoro, Junko Ikeda
Existing commercial finite element analysis (FEA) codes do not exhibit the performance necessary for large scale analysis on parallel computer systems. In this paper, we demonstrate the performance characteristics of a commercial parallel structural analysis code, ADVC, on Blue Gene/L (BG/L). The numerical algorithm of ADVC is described, tuned, and optimized on BG/L, and then a large scale drop impact analysis of a mobile phone is performed. The model of the mobile phone is a nearly-full assembly that includes inner structures. The size of the model we have analyzed has 47 million nodal points and 142 million DOFs. This does not seem exceptionally large, but the dynamic impact analysis of a product model, with the contact condition on the entire surface of the outer case under this size, cannot be handled by other CAE systems. Our analysis is an unprecedented attempt in the electronics industry. It took only half a day, 12.1 hours, for the analysis of about 2.4 milliseconds. The floating point operation performance obtained has been 538 GFLOPS on 4096 node of BG/L.
{"title":"Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L","authors":"H. Akiba, T. Ohyama, Y. Shibata, Kiyoshi Yuyama, Yoshikazu Katai, R. Takeuchi, T. Hoshino, S. Yoshimura, H. Noguchi, Manish Gupta, John A. Gunnels, V. Austel, Yogish Sabharwal, R. Garg, S. Kato, T. Kawakami, Satoru Todokoro, Junko Ikeda","doi":"10.1145/1188455.1188503","DOIUrl":"https://doi.org/10.1145/1188455.1188503","url":null,"abstract":"Existing commercial finite element analysis (FEA) codes do not exhibit the performance necessary for large scale analysis on parallel computer systems. In this paper, we demonstrate the performance characteristics of a commercial parallel structural analysis code, ADVC, on Blue Gene/L (BG/L). The numerical algorithm of ADVC is described, tuned, and optimized on BG/L, and then a large scale drop impact analysis of a mobile phone is performed. The model of the mobile phone is a nearly-full assembly that includes inner structures. The size of the model we have analyzed has 47 million nodal points and 142 million DOFs. This does not seem exceptionally large, but the dynamic impact analysis of a product model, with the contact condition on the entire surface of the outer case under this size, cannot be handled by other CAE systems. Our analysis is an unprecedented attempt in the electronics industry. It took only half a day, 12.1 hours, for the analysis of about 2.4 milliseconds. The floating point operation performance obtained has been 538 GFLOPS on 4096 node of BG/L.","PeriodicalId":115940,"journal":{"name":"Proceedings of the 2006 ACM/IEEE conference on Supercomputing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128541567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}