Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.170
Maricris L. Mayes, G. Fletcher, M. Gordon
Summary form only given. One of the major challenges of modern quantum chemistry (QC) is to apply it to large systems with thousands of correlated electrons and basis functions. The availability of supercomputers and development of novel methods are necessary to realize this challenge. In particular, we employ linear scaling Fragment Molecular Orbital (FMO) method which decompose the large system into smaller, localized fragments which can be treated with high-level QC method like MP2. FMO is inherently scalable since the individual fragment calculations can be carried out simultaneously on separate processor groups. It is implemented in GAMESS, a popular ab-initio QC program. We present the scalability and performance of FMO on Intrepid (Blue Gene/P) and Blue Gene/Q systems at ALCF.
{"title":"Abstract: Towards Highly Accurate Large-Scale Ab Initio Calculations Using Fragment Molecular Orbital Method in GAMESS","authors":"Maricris L. Mayes, G. Fletcher, M. Gordon","doi":"10.1109/SC.Companion.2012.170","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.170","url":null,"abstract":"Summary form only given. One of the major challenges of modern quantum chemistry (QC) is to apply it to large systems with thousands of correlated electrons and basis functions. The availability of supercomputers and development of novel methods are necessary to realize this challenge. In particular, we employ linear scaling Fragment Molecular Orbital (FMO) method which decompose the large system into smaller, localized fragments which can be treated with high-level QC method like MP2. FMO is inherently scalable since the individual fragment calculations can be carried out simultaneously on separate processor groups. It is implemented in GAMESS, a popular ab-initio QC program. We present the scalability and performance of FMO on Intrepid (Blue Gene/P) and Blue Gene/Q systems at ALCF.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"109 1","pages":"1335-1335"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86007733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.294
Mehmet Balman
High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications' perspective. We have experimented with current state-of-the-art data movement tools, and realized that file-centric data transfer protocols do not perform well with managing the transfer of many small files in high-bandwidth networks, even when using parallel streams or concurrent transfers. We require enhancements in current middleware tools to take advantage of future networking frameworks. To improve performance and efficiency, we develop an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel, which uses a block-based data movement method in moving large scientific datasets. We have implemented MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. In this work, we present our initial results in 100Gbps network.
{"title":"Abstract: MemzNet: Memory-Mapped Zero-Copy Network Channel for Moving Large Datasets over 100Gbps Network","authors":"Mehmet Balman","doi":"10.1109/SC.Companion.2012.294","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.294","url":null,"abstract":"High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications' perspective. We have experimented with current state-of-the-art data movement tools, and realized that file-centric data transfer protocols do not perform well with managing the transfer of many small files in high-bandwidth networks, even when using parallel streams or concurrent transfers. We require enhancements in current middleware tools to take advantage of future networking frameworks. To improve performance and efficiency, we develop an experimental prototype, called MemzNet: Memory-mapped Zero-copy Network Channel, which uses a block-based data movement method in moving large scientific datasets. We have implemented MemzNet that takes the approach of aggregating files into blocks and providing dynamic data channel management. In this work, we present our initial results in 100Gbps network.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"78 1","pages":"1511-1512"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78478286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.264
Brian W. Barrett, R. Brightwell, K. Underwood, K. Hemmert
Portals 4 is an advanced network programming interface which allows for the development of a rich set of upper layer protocols. By careful selection of interfaces and strong progress guarantees, Portals 4 is able to support multiple protocols without significant overhead. Recent developments with Portals 4, including development of MPI, SHMEM, and GASNet protocols are discussed.
{"title":"Poster: Portals 4 Network Programming Interface","authors":"Brian W. Barrett, R. Brightwell, K. Underwood, K. Hemmert","doi":"10.1109/SC.Companion.2012.264","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.264","url":null,"abstract":"Portals 4 is an advanced network programming interface which allows for the development of a rich set of upper layer protocols. By careful selection of interfaces and strong progress guarantees, Portals 4 is able to support multiple protocols without significant overhead. Recent developments with Portals 4, including development of MPI, SHMEM, and GASNet protocols are discussed.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"19 1","pages":"1467-1467"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81830428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.29
A. Chervenak, David E. Smith, Weiwei Chen, E. Deelman
As scientific applications generate and consume data at ever-increasing rates, scientific workflow systems that manage the growing complexity of analyses and data movement will increase in importance. The goal of our work is to improve the overall performance of scientific workflows by using policy to improve data staging into and out of computational resources. We developed a Policy Service that gives advice to the workflow system about how to stage data, including advice on the order of data transfers and on transfer parameters. The Policy Service gives this advice based on its knowledge of ongoing transfers, recent transfer performance, and the current allocation of resources for data staging. The paper describes the architecture of the Policy Service and its integration with the Pegasus Workflow Management System. It employs a range of policies for data staging, and presents performance results for one policy that does a greedy allocation of data transfer streams between source and destination sites. The results show performance improvements for a data-intensive workflow: the Montage astronomy workflow augmented to perform additional large data staging operations.
{"title":"Integrating Policy with Scientific Workflow Management for Data-Intensive Applications","authors":"A. Chervenak, David E. Smith, Weiwei Chen, E. Deelman","doi":"10.1109/SC.Companion.2012.29","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.29","url":null,"abstract":"As scientific applications generate and consume data at ever-increasing rates, scientific workflow systems that manage the growing complexity of analyses and data movement will increase in importance. The goal of our work is to improve the overall performance of scientific workflows by using policy to improve data staging into and out of computational resources. We developed a Policy Service that gives advice to the workflow system about how to stage data, including advice on the order of data transfers and on transfer parameters. The Policy Service gives this advice based on its knowledge of ongoing transfers, recent transfer performance, and the current allocation of resources for data staging. The paper describes the architecture of the Policy Service and its integration with the Pegasus Workflow Management System. It employs a range of policies for data staging, and presents performance results for one policy that does a greedy allocation of data transfer streams between source and destination sites. The results show performance improvements for a data-intensive workflow: the Montage astronomy workflow augmented to perform additional large data staging operations.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"28 1","pages":"140-149"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90303692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.124
Nicolas Dubé
This presentation debunks three "truths" as seen from Plato's cave: the untold story of PUE, clean coal, and water is free and available.
这个演讲揭穿了从柏拉图的洞穴中看到的三个“真理”:PUE的不为人知的故事,清洁煤,水是免费的。
{"title":"Philosophy 301: But Can You \"Handle the Truth\"?","authors":"Nicolas Dubé","doi":"10.1109/SC.Companion.2012.124","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.124","url":null,"abstract":"This presentation debunks three \"truths\" as seen from Plato's cave: the untold story of PUE, clean coal, and water is free and available.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"19 1","pages":"993-1017"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90699826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.COMPANION.2012.150
D. Gunter, S. Cholia, Anubhav Jain, M. Kocher, K. Persson, L. Ramakrishnan, S. Ong, G. Ceder
Efforts such as the Human Genome Project provided a dramatic example of opening scientific datasets to the community. Making high quality scientific data accessible through an online database allows scientists around the world to multiply the value of that data through scientific innovations. Similarly, the goal of the Materials Project is to calculate physical properties of all known inorganic materials and make this data freely available, with the goal of accelerating to invention of better materials. However, the complexity of scientific data, and the complexity of the simulations needed to generate and analyze it, pose challenges to current software ecosystem. In this paper, we describe the approach we used in the Materials Project to overcome these challenges and create and disseminate a high quality database of materials properties computed by solving the basic laws of physics. Our infrastructure requires a novel combination of highthroughput approaches with broadly applicable and scalable approaches to data storage and dissemination.
{"title":"Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials Project","authors":"D. Gunter, S. Cholia, Anubhav Jain, M. Kocher, K. Persson, L. Ramakrishnan, S. Ong, G. Ceder","doi":"10.1109/SC.COMPANION.2012.150","DOIUrl":"https://doi.org/10.1109/SC.COMPANION.2012.150","url":null,"abstract":"Efforts such as the Human Genome Project provided a dramatic example of opening scientific datasets to the community. Making high quality scientific data accessible through an online database allows scientists around the world to multiply the value of that data through scientific innovations. Similarly, the goal of the Materials Project is to calculate physical properties of all known inorganic materials and make this data freely available, with the goal of accelerating to invention of better materials. However, the complexity of scientific data, and the complexity of the simulations needed to generate and analyze it, pose challenges to current software ecosystem. In this paper, we describe the approach we used in the Materials Project to overcome these challenges and create and disseminate a high quality database of materials properties computed by solving the basic laws of physics. Our infrastructure requires a novel combination of highthroughput approaches with broadly applicable and scalable approaches to data storage and dissemination.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"23 1","pages":"1244-1251"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90792496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.358
B. Walkup
This article consists of a collection of slides from the author's conference presentation. The author concludes that The Blue Gene/Q design, low-power simple cores, four hardware threads per core, resu lts in high instruction throughput, and thus exceptional power efficiency for applications. Can effectively fill in pipeline stalls and hide latencies in the memory subsystem. The consequence is low performance per thread, so a high degree of parallelization is required for high application performance. Traditional programming methods (MPI, OpenMP, Pthreads) hold up at very large scales. Memory costs can limit scaling when there are data-structures with size linear in the number of processes, threading helps by keeping the number of processes manageable. Detailed performance analysis is viable at > 10^6 processes but requires care. On-the-fly performance data reduction has merits.
{"title":"Application performance characterization and analysis on Blue Gene/Q","authors":"B. Walkup","doi":"10.1109/SC.Companion.2012.358","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.358","url":null,"abstract":"This article consists of a collection of slides from the author's conference presentation. The author concludes that The Blue Gene/Q design, low-power simple cores, four hardware threads per core, resu lts in high instruction throughput, and thus exceptional power efficiency for applications. Can effectively fill in pipeline stalls and hide latencies in the memory subsystem. The consequence is low performance per thread, so a high degree of parallelization is required for high application performance. Traditional programming methods (MPI, OpenMP, Pthreads) hold up at very large scales. Memory costs can limit scaling when there are data-structures with size linear in the number of processes, threading helps by keeping the number of processes manageable. Detailed performance analysis is viable at > 10^6 processes but requires care. On-the-fly performance data reduction has merits.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"77 1","pages":"2247-2280"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80791774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.212
C. Kessler, Usman Dastgeer, M. Majeed, N. Furmento, Samuel Thibault, R. Namyst, S. Benkner, Sabri Pllana, J. Träff, Martin Wimmer
PEPPHER is a 3-year EU FP7 project that develops a novel approach and framework to enhance performance portability and programmability of heterogeneous multi-core systems. Its primary target is single-node heterogeneous systems, where several CPU cores are supported by accelerators such as GPUs. This poster briefly surveys the PEPPHER framework for single-node systems, and elaborates on the prospectives for leveraging the PEPPHER approach to generate performance-portable code for heterogeneous multi-node systems.
{"title":"Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing","authors":"C. Kessler, Usman Dastgeer, M. Majeed, N. Furmento, Samuel Thibault, R. Namyst, S. Benkner, Sabri Pllana, J. Träff, Martin Wimmer","doi":"10.1109/SC.Companion.2012.212","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.212","url":null,"abstract":"PEPPHER is a 3-year EU FP7 project that develops a novel approach and framework to enhance performance portability and programmability of heterogeneous multi-core systems. Its primary target is single-node heterogeneous systems, where several CPU cores are supported by accelerators such as GPUs. This poster briefly surveys the PEPPHER framework for single-node systems, and elaborates on the prospectives for leveraging the PEPPHER approach to generate performance-portable code for heterogeneous multi-node systems.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"90 1","pages":"1395-1396"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86603215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.232
Michael O. Lam, B. Supinski, M. LeGendre, J. Hollingsworth
As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. We present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. Initial results with the Algebraic MultiGrid kernel demonstrate a nearly 2χ speedup.
{"title":"Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation","authors":"Michael O. Lam, B. Supinski, M. LeGendre, J. Hollingsworth","doi":"10.1109/SC.Companion.2012.232","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.232","url":null,"abstract":"As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. We present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. Initial results with the Algebraic MultiGrid kernel demonstrate a nearly 2χ speedup.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"96 1","pages":"1424-1424"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88408077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.132
D. Ghoshal, L. Ramakrishnan
Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance-cost trade-offs, complex application choices, complexity associated with elasticity and, failure rates. The explosion in scientific data coupled with unique characteristics of cloud environments require a more flexible and robust distributed data management solution than the ones currently in existence. This paper describes the design and implementation of FRIEDA - a Flexible Robust Intelligent Elastic Data Management framework. FRIEDA coordinates data in a transient cloud environment taking into account specific application characteristics. Additionally, we describe a range of data management strategies and show the benefit of flexible data management schemes in cloud environments. We study two distinct scientific applications from bioinformatics and image analysis to understand the effectiveness of such a framework.
{"title":"FRIEDA: Flexible Robust Intelligent Elastic Data Management in Cloud Environments","authors":"D. Ghoshal, L. Ramakrishnan","doi":"10.1109/SC.Companion.2012.132","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.132","url":null,"abstract":"Scientific applications are increasingly using cloud resources for their data analysis workflows. However, managing data effectively and efficiently over these cloud resources is challenging due to the myriad storage choices with different performance-cost trade-offs, complex application choices, complexity associated with elasticity and, failure rates. The explosion in scientific data coupled with unique characteristics of cloud environments require a more flexible and robust distributed data management solution than the ones currently in existence. This paper describes the design and implementation of FRIEDA - a Flexible Robust Intelligent Elastic Data Management framework. FRIEDA coordinates data in a transient cloud environment taking into account specific application characteristics. Additionally, we describe a range of data management strategies and show the benefit of flexible data management schemes in cloud environments. We study two distinct scientific applications from bioinformatics and image analysis to understand the effectiveness of such a framework.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"116 1","pages":"1096-1105"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79367490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}