Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2011-11-12 DOI:10.1145/2063384.2063466

Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison

{"title":"Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime","authors":"Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison","doi":"10.1145/2063384.2063466","DOIUrl":null,"url":null,"abstract":"A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063384.2063466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

Abstract

A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在具有多核优化的消息驱动运行时的千万亿次机器上启用和缩放1亿个原子的生物分子模拟

使用NAMD进行的1亿原子生物分子模拟是nsf资助的可持续千万亿次机器的三个基准之一。在千兆级机器上模拟这个大分子系统带来了巨大的挑战，包括处理I/O、大内存占用和获得良好的强扩展结果。在本文中，我们提出了并行I/O技术来实现仿真。通过扩展Charm++异步消息驱动的运行时，设计了一个新的SMP模型，以有效地利用无处不在的宽多核集群。我们利用节点感知技术来优化应用程序和底层SMP运行时。在橡树岭国家实验室，分层负载平衡被进一步利用，将NAMD扩展到完整的Jaguar PF Cray XT5(224,076核)，无论是否有PME全静电，在一个简单的截止计算中，以每步9毫秒的速度实现93%的并行效率(vs 6720核)。在阿贡国家实验室的65,536个无畏蓝色基因/P内核上也获得了出色的缩放。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

自引率

0.00%

发文量

期刊最新文献

Challenges of HPC monitoring Scalable fast multipole methods on distributed heterogeneous architectures Hadoop acceleration through network levitated merge Scalable stochastic optimization of complex energy systems How to measure useful, sustained performance