Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.41
M. Bernaschi, M. Bisson, Enrico Mastrostefano, D. Rossetti
We present preliminary results of a multi-GPU code for exploring large graphs (hundreds of millions vertices and billions of edges) by using the Breadth First Search algorithm. The GPU hosts are connected by APEnet+, a custom interconnection network that has full support for NVIDIA GPUDirect peer-topeer communication, i.e. the technology allowing a third party device to directly access the GPU memory over the PCI express bus.
{"title":"Breadth First Search on APEnet+","authors":"M. Bernaschi, M. Bisson, Enrico Mastrostefano, D. Rossetti","doi":"10.1109/SC.Companion.2012.41","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.41","url":null,"abstract":"We present preliminary results of a multi-GPU code for exploring large graphs (hundreds of millions vertices and billions of edges) by using the Breadth First Search algorithm. The GPU hosts are connected by APEnet+, a custom interconnection network that has full support for NVIDIA GPUDirect peer-topeer communication, i.e. the technology allowing a third party device to directly access the GPU memory over the PCI express bus.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"11 1","pages":"248-253"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73717034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.303
S. Sreepathi
This research presents a parallel metaheuristic optimization framework, Optimus (Optimization Methods for Universal Simulators) for integration of a desired population-based search method with a target scientific application. Optimus includes a parallel middleware component, PRIME (Parallel Reconfigurable Iterative Middleware Engine) for scalable deployment on emergent supercomputing architectures. Additionally, we designed TAPSO (Topology Aware Particle Swarm Optimization) for network based optimization problems and applied it to achieve better convergence for water distribution system (WDS) applications. The framework supports concurrent optimization instances, for instance multiple swarms in the case of PSO. PRIME provides a lightweight communication layer to facilitate periodic inter-optimizer data exchanges. We performed scalability analysis of Optimus on Cray XK6(Jaguar) at Oak Ridge Leadership Computing Facility for the leak detection problem in WDS. For a weak scaling scenario, we achieved 84.82% of baseline at 200,000 cores relative to performance at 1000 cores and 72.84% relative to one core scenario.
{"title":"Optimus: A Parallel Optimization Framework with Topology Aware PSO and Applications","authors":"S. Sreepathi","doi":"10.1109/SC.Companion.2012.303","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.303","url":null,"abstract":"This research presents a parallel metaheuristic optimization framework, Optimus (Optimization Methods for Universal Simulators) for integration of a desired population-based search method with a target scientific application. Optimus includes a parallel middleware component, PRIME (Parallel Reconfigurable Iterative Middleware Engine) for scalable deployment on emergent supercomputing architectures. Additionally, we designed TAPSO (Topology Aware Particle Swarm Optimization) for network based optimization problems and applied it to achieve better convergence for water distribution system (WDS) applications. The framework supports concurrent optimization instances, for instance multiple swarms in the case of PSO. PRIME provides a lightweight communication layer to facilitate periodic inter-optimizer data exchanges. We performed scalability analysis of Optimus on Cray XK6(Jaguar) at Oak Ridge Leadership Computing Facility for the leak detection problem in WDS. For a weak scaling scenario, we achieved 84.82% of baseline at 200,000 cores relative to performance at 1000 cores and 72.84% relative to one core scenario.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"6 1","pages":"1524-1525"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75051942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.197
Anshu Arya, T. Gamblin, B. Supinski, L. Kalé
Intelligently mapping applications to machine network topologies has been shown to improve performance, but considerable developer effort is required to find good mappings. Techniques from graph partitioning have the potential to automate topology mapping and relieve the developer burden. Graph partitioning is already used for load balancing parallel applications, but can be applied to topology mapping as well. We show performance gains by using a topology-targeting graph partitioner to map sparse matrix-vector and volumetric 3-D FFT kernels onto a 3-D torus network.
{"title":"Poster: Evaluation Topology Mapping via Graph Partitioning","authors":"Anshu Arya, T. Gamblin, B. Supinski, L. Kalé","doi":"10.1109/SC.Companion.2012.197","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.197","url":null,"abstract":"Intelligently mapping applications to machine network topologies has been shown to improve performance, but considerable developer effort is required to find good mappings. Techniques from graph partitioning have the potential to automate topology mapping and relieve the developer burden. Graph partitioning is already used for load balancing parallel applications, but can be applied to topology mapping as well. We show performance gains by using a topology-targeting graph partitioner to map sparse matrix-vector and volumetric 3-D FFT kernels onto a 3-D torus network.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"35 1","pages":"1372-1372"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75088626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.281
A. Kuroda, K. Minami, T. Yamasaki, J. Nara, J. Koga, T. Uda, T. Ohno
We show the efficiency of a first-principles electronic structure calculation code, PHASE on the massive-parallel super computer, K, which has 80,000 nodes. This code is based on plane-wave basis set, thus FFT routines are included. We succeeded in parallelization of FFT routines needed in our code by localizing each FFT calculation in small number of nodes, resulting in decreasing communication time required for FFT calculation. We also introduce multi-axis parallelization for bands and plane waves and then PHASE shows very high parallel efficiency. By using this code, we have investigated the structural stability of screw dislocations in silicon carbide, which has attracted much attention due to the semiconductor industry importance.
{"title":"Poster: Planewave-Based First-Principles MD Calculation on 80,000-node K-Computer","authors":"A. Kuroda, K. Minami, T. Yamasaki, J. Nara, J. Koga, T. Uda, T. Ohno","doi":"10.1109/SC.Companion.2012.281","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.281","url":null,"abstract":"We show the efficiency of a first-principles electronic structure calculation code, PHASE on the massive-parallel super computer, K, which has 80,000 nodes. This code is based on plane-wave basis set, thus FFT routines are included. We succeeded in parallelization of FFT routines needed in our code by localizing each FFT calculation in small number of nodes, resulting in decreasing communication time required for FFT calculation. We also introduce multi-axis parallelization for bands and plane waves and then PHASE shows very high parallel efficiency. By using this code, we have investigated the structural stability of screw dislocations in silicon carbide, which has attracted much attention due to the semiconductor industry importance.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"616 1","pages":"1491-1492"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77650863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.114
Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky
Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.
{"title":"Quality-Aware Data Management for Large Scale Scientific Applications","authors":"Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky","doi":"10.1109/SC.Companion.2012.114","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.114","url":null,"abstract":"Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"27 1","pages":"816-820"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81897312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.20
Muhammad Ali Amer, Robert Lucas
Workflow management systems (WMS) are typically comprised of or make use of multiple independent software components. The design and development of those components is typically drawn from functional requirements of scientific applications that utilize the corresponding WMS. Consequently, the WMS design reflects those core functional requirements in applications that it supports in the future. Whereas most design criteria are engineered to be as generic as possible, some design trade-offs may prove sub-optimal for certain new workflow applications. We argue that WMS design tradeoffs that emerge from a limited set of real-world applications can be minimized by the use of larger, more varied synthetic application datasets. We present SDAG, a tool for generating synthetic well formed workflows (WFWs) that span a varied space of synthetic WFWs around any reference workflow. These synthetic WFWs enable developers to test and evaluate WMS or their constituent software components on a broad range of workflows and enable more generic design criteria for WMS.
{"title":"Evaluating Workflow Tools with SDAG","authors":"Muhammad Ali Amer, Robert Lucas","doi":"10.1109/SC.Companion.2012.20","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.20","url":null,"abstract":"Workflow management systems (WMS) are typically comprised of or make use of multiple independent software components. The design and development of those components is typically drawn from functional requirements of scientific applications that utilize the corresponding WMS. Consequently, the WMS design reflects those core functional requirements in applications that it supports in the future. Whereas most design criteria are engineered to be as generic as possible, some design trade-offs may prove sub-optimal for certain new workflow applications. We argue that WMS design tradeoffs that emerge from a limited set of real-world applications can be minimized by the use of larger, more varied synthetic application datasets. We present SDAG, a tool for generating synthetic well formed workflows (WFWs) that span a varied space of synthetic WFWs around any reference workflow. These synthetic WFWs enable developers to test and evaluate WMS or their constituent software components on a broad range of workflows and enable more generic design criteria for WMS.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"19 1","pages":"54-63"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86006048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.344
M. Yokokawa
This article consists of a collection of slides from the author's conference presentation. The author concludes that HPC technology is essential for sustainable human life in the future and we have to promote HPC activities more and more. By K's powerful and stable computing capability, we expect useful results in science and engineering and break-through in research and development. We should pursue more realistic simulations by the future system. Japan will continue to contribute to HPC community.
{"title":"The K computer - Toward its productive applications to our life","authors":"M. Yokokawa","doi":"10.1109/SC.Companion.2012.344","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.344","url":null,"abstract":"This article consists of a collection of slides from the author's conference presentation. The author concludes that HPC technology is essential for sustainable human life in the future and we have to promote HPC activities more and more. By K's powerful and stable computing capability, we expect useful results in science and engineering and break-through in research and development. We should pursue more realistic simulations by the future system. Japan will continue to contribute to HPC community.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"28 1","pages":"1673-1701"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86042863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.154
Chen He, D. Weitzel, D. Swanson, Ying Lu
MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop's fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG's rapid scalability can provide comparable performance to a dedicated Hadoop cluster.
MapReduce是一个强大的数据处理平台,适用于商业和学术应用。在本文中,我们构建了一个新的Hadoop MapReduce框架,该框架在开放科学网格上执行,该网格横跨美国的多个机构- Hadoop on the Grid (HOG)。它不同于以前在集群或云等专用环境上运行的MapReduce平台。HOG在网格的机会资源上提供了一个自由、弹性和动态的MapReduce环境。在HOG中,我们通过将美国各地的数据中心映射到虚拟机架并创建多机构故障域,提高了Hadoop对广域数据分析的容错性。我们对Hadoop框架的修改对现有的Hadoop MapReduce应用程序是透明的。在评估中,我们成功地将HOG扩展到网格上的1100个节点。此外,我们用模拟的Facebook Hadoop MapReduce工作负载来评估HOG。我们得出结论,HOG的快速可伸缩性可以提供与专用Hadoop集群相当的性能。
{"title":"HOG: Distributed Hadoop MapReduce on the Grid","authors":"Chen He, D. Weitzel, D. Swanson, Ying Lu","doi":"10.1109/SC.Companion.2012.154","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.154","url":null,"abstract":"MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop's fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG's rapid scalability can provide comparable performance to a dedicated Hadoop cluster.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"33 1","pages":"1276-1283"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84028458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-10DOI: 10.1109/SC.Companion.2012.113
Drew Schmidt, G. Ostrouchov, Wei-Chen Chen, Pragneshkumar B. Patel
We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common statistical methods, allowing the user to analyze bigger datasets with R than ever before. Early benchmarks show great optimism for the project and its future.
{"title":"Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data","authors":"Drew Schmidt, G. Ostrouchov, Wei-Chen Chen, Pragneshkumar B. Patel","doi":"10.1109/SC.Companion.2012.113","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.113","url":null,"abstract":"We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common statistical methods, allowing the user to analyze bigger datasets with R than ever before. Early benchmarks show great optimism for the project and its future.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"14 1","pages":"811-815"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77025442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}