Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)最新文献
This work describes the best practices for configuring and managing an Intel® Xeon Phi™ cluster. The Xeon Phi presents a unique environment to the user and preparing this environment requires unique procedures. This work will outline these procedures and provide examples for HPC Administrators to utilize and then customize for their system. Considerable effort has been put forth to help researchers determine how to maximize their performance on the Xeon Phi, but little has been done for the administrators of these systems. Now that the Xeon Phis are being deployed on larger systems, there is a need for information on how to manage and deploy these systems. The information provided here will serve as a supplement to the documentation Intel provides in order to bridge the gap between workstation and cluster deployments. This work is based on the authors experiences deploying and maintaining the Beacon cluster at the University of Tennessee's Application Acceleration Center of Excellence (AACE).
{"title":"Best Practices for Administering a Medium Sized Cluster with Intel® Xeon Phi™ Coprocessors","authors":"Paul Peltz, Troy Baer","doi":"10.1145/2616498.2616538","DOIUrl":"https://doi.org/10.1145/2616498.2616538","url":null,"abstract":"This work describes the best practices for configuring and managing an Intel® Xeon Phi™ cluster. The Xeon Phi presents a unique environment to the user and preparing this environment requires unique procedures. This work will outline these procedures and provide examples for HPC Administrators to utilize and then customize for their system. Considerable effort has been put forth to help researchers determine how to maximize their performance on the Xeon Phi, but little has been done for the administrators of these systems. Now that the Xeon Phis are being deployed on larger systems, there is a need for information on how to manage and deploy these systems. The information provided here will serve as a supplement to the documentation Intel provides in order to bridge the gap between workstation and cluster deployments. This work is based on the authors experiences deploying and maintaining the Beacon cluster at the University of Tennessee's Application Acceleration Center of Excellence (AACE).","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"65 1","pages":"34:1-34:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82365768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Blue Waters system at the National Center for Supercomputing Applications (NCSA) is the largest GPU accelerated system in the NSF's portfolio with greater than (>) 4200 Nvidia K20x accelerators and greater than (>) 22500 compute nodes overall. Using the accelerator nodes effectively is paramount to the system's success as they represent approximately 1/7 of system peak performance. As an XSEDE level 2 service provider, the system is also available to education allocations proposed by XSEDE educators and trainers. The training staff working at Pittsburgh Supercomputing Center (PSC) along with their XSEDE and Nvidia partners have offered multiple OpenACC workshops since 2012. The most recent workshop was conducted on Blue Waters hosting the hands-on sessions and it was very successful. As a direct result of working with PSC on these workshop, NCSA researchers have been able to obtain significant speedups on real-world algorithms using OpenACC in the Cray environment. In this work we will look at two key kernel codes (3D FFT kernel, Laplace 2D MPI benchmark) and the path to obtaining the observed performance gains.
{"title":"XSEDE OpenACC workshop enables Blue Waters Researchers to Accelerate Key Algorithms","authors":"G. W. Arnold, M. Gajbe, S. Koric, J. Urbanic","doi":"10.1145/2616498.2616530","DOIUrl":"https://doi.org/10.1145/2616498.2616530","url":null,"abstract":"The Blue Waters system at the National Center for Supercomputing Applications (NCSA) is the largest GPU accelerated system in the NSF's portfolio with greater than (>) 4200 Nvidia K20x accelerators and greater than (>) 22500 compute nodes overall. Using the accelerator nodes effectively is paramount to the system's success as they represent approximately 1/7 of system peak performance. As an XSEDE level 2 service provider, the system is also available to education allocations proposed by XSEDE educators and trainers. The training staff working at Pittsburgh Supercomputing Center (PSC) along with their XSEDE and Nvidia partners have offered multiple OpenACC workshops since 2012. The most recent workshop was conducted on Blue Waters hosting the hands-on sessions and it was very successful. As a direct result of working with PSC on these workshop, NCSA researchers have been able to obtain significant speedups on real-world algorithms using OpenACC in the Cray environment. In this work we will look at two key kernel codes (3D FFT kernel, Laplace 2D MPI benchmark) and the path to obtaining the observed performance gains.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"44 1","pages":"28:1-28:6"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77780814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, computing resources like supercomputers are shared by many users. Most systems are equipped with batch systems as their resource managers. From a user's perspective, the overall turnaround of each submitted job is measured by time-to-solution which consists of the sum of batch queuing time and execution time. On a busy machine, most jobs spend more time waiting in the batch queue than their real job executions. And rarely this is a topic of performance tuning and optimization of parallel computing. we propose a workload aware method systematically to predict jobs' batch queue waiting time patterns. Consequently, it will help user to optimize utilization and improve productivity. With workload data gathered from a supercomputer, we apply Bayesian framework to predict the temporal trend of long-time batch queue waiting probability. Thus, the workload of the machine not only can be predicted, we are able to provide users with a monthly updated reference chart to suggest job submission assembled with better chosen number of CPU and running time requests, which will avoid long-time waiting in batch queue. Our experiment shows that the model could make over 89% correct predictions for all cases we have tested.
{"title":"Workload Aware Utilization Optimization for a Petaflop Supercomputer: Evidence Based Assessment Using Statistical Methods","authors":"Fei Xing, Haihang You","doi":"10.1145/2616498.2616536","DOIUrl":"https://doi.org/10.1145/2616498.2616536","url":null,"abstract":"Nowadays, computing resources like supercomputers are shared by many users. Most systems are equipped with batch systems as their resource managers. From a user's perspective, the overall turnaround of each submitted job is measured by time-to-solution which consists of the sum of batch queuing time and execution time. On a busy machine, most jobs spend more time waiting in the batch queue than their real job executions. And rarely this is a topic of performance tuning and optimization of parallel computing. we propose a workload aware method systematically to predict jobs' batch queue waiting time patterns. Consequently, it will help user to optimize utilization and improve productivity. With workload data gathered from a supercomputer, we apply Bayesian framework to predict the temporal trend of long-time batch queue waiting probability. Thus, the workload of the machine not only can be predicted, we are able to provide users with a monthly updated reference chart to suggest job submission assembled with better chosen number of CPU and running time requests, which will avoid long-time waiting in batch queue. Our experiment shows that the model could make over 89% correct predictions for all cases we have tested.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"30 1","pages":"50:1-50:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73949426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot
Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.
{"title":"FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis","authors":"Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot","doi":"10.1145/2616498.2616511","DOIUrl":"https://doi.org/10.1145/2616498.2616511","url":null,"abstract":"Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"78 1","pages":"17:1-17:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85916710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study reports development and validation of two parallel flame solvers with soot models based on the open-source computation fluid dynamics (CFD) toolbox code OpenFOAM. First, a laminar flame solver is developed and validated against experimental data. A semi-empirical two-equation soot model and a detailed soot model using a method of moments with interpolative closure (MOMIC) are implemented in the laminar flame solver. An optically thin radiation model including gray soot radiation is also implemented. Preliminary results using these models show good agreement with experimental data for the laminar axisymmetric diffusion flame studied. Second, a turbulent flame solver is developed using Reynolds-averaged equations and transported probability density function (tPDF) method. The MOMIC soot model is implemented on this turbulent solver. A sophisticated photon Monte-Carlo (PMC) model with line-by-line spectral radiation database for modeling is also implemented on the turbulent solver. The validation of the turbulent solver is under progress. Both the solvers show good scalability for a moderate-sized chemical mechanism, and can be expected to scale even more strongly when larger chemical mechanisms are used.
{"title":"Detailed computational modeling of laminar and turbulent sooting flames","authors":"A. Dasgupta, Somesh P. Roy, D. Haworth","doi":"10.1145/2616498.2616509","DOIUrl":"https://doi.org/10.1145/2616498.2616509","url":null,"abstract":"This study reports development and validation of two parallel flame solvers with soot models based on the open-source computation fluid dynamics (CFD) toolbox code OpenFOAM. First, a laminar flame solver is developed and validated against experimental data. A semi-empirical two-equation soot model and a detailed soot model using a method of moments with interpolative closure (MOMIC) are implemented in the laminar flame solver. An optically thin radiation model including gray soot radiation is also implemented. Preliminary results using these models show good agreement with experimental data for the laminar axisymmetric diffusion flame studied. Second, a turbulent flame solver is developed using Reynolds-averaged equations and transported probability density function (tPDF) method. The MOMIC soot model is implemented on this turbulent solver. A sophisticated photon Monte-Carlo (PMC) model with line-by-line spectral radiation database for modeling is also implemented on the turbulent solver. The validation of the turbulent solver is under progress. Both the solvers show good scalability for a moderate-sized chemical mechanism, and can be expected to scale even more strongly when larger chemical mechanisms are used.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"14 1","pages":"12:1-12:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84369670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.
{"title":"Runtime Pipeline Scheduling System for Heterogeneous Architectures","authors":"Julio C. Olaya, R. Romero","doi":"10.1145/2616498.2616547","DOIUrl":"https://doi.org/10.1145/2616498.2616547","url":null,"abstract":"Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"8 1","pages":"45:1-45:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89457972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fostering the free and open sharing of scientific knowledge between the scientific community and general public is the goal of Academic Torrents. At its core it is a distributed network for efficient content dissemination, connecting scientists, academic journals, readers, research groups, and many others. Leveraging the power of its peer-to-peer architecture, Academic Torrents makes science more accessible through two initiatives. The open data initiative allows researchers to share their datasets at high speeds with low bandwidth costs through the peer-to-peer network. The cooperative nature of scientific research demands access to data, but researchers face significant hurdles making their data available. The technical benefits of the Academic Torrents network allows researchers to scalably and globally distribute content, leading to its adoption by labs all around the world to disseminate and share scientific data. Academic Torrent's open access initiative uses the same technology to share open access papers between institutions and individuals. We design a connector to our network that acts as a onsite digital stack to complement the already existing physical stack curated in the same manner. Utilizing the collective resources of the academic community we eliminate the biases in the closed subscription model and the pay to publish model.
{"title":"Academic Torrents: A Community-Maintained Distributed Repository","authors":"Joseph Paul Cohen, Henry Z. Lo","doi":"10.1145/2616498.2616528","DOIUrl":"https://doi.org/10.1145/2616498.2616528","url":null,"abstract":"Fostering the free and open sharing of scientific knowledge between the scientific community and general public is the goal of Academic Torrents. At its core it is a distributed network for efficient content dissemination, connecting scientists, academic journals, readers, research groups, and many others. Leveraging the power of its peer-to-peer architecture, Academic Torrents makes science more accessible through two initiatives. The open data initiative allows researchers to share their datasets at high speeds with low bandwidth costs through the peer-to-peer network. The cooperative nature of scientific research demands access to data, but researchers face significant hurdles making their data available. The technical benefits of the Academic Torrents network allows researchers to scalably and globally distribute content, leading to its adoption by labs all around the world to disseminate and share scientific data. Academic Torrent's open access initiative uses the same technology to share open access papers between institutions and individuals. We design a connector to our network that acts as a onsite digital stack to complement the already existing physical stack curated in the same manner. Utilizing the collective resources of the academic community we eliminate the biases in the closed subscription model and the pay to publish model.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"99 5 1","pages":"2:1-2:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87723463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Wang, James W. Mazzuca, Sophya Garashchuk, J. Jakowski
This paper describes a quantum trajectory (QT) approach to molecular dynamics with quantum corrections on behavior of the nuclei interfaced with the on-the-fly evaluation of electronic structure (ES). Nuclear wavefunction is represented by an ensemble of trajectories, concurrently propagated in time under the influence of the quantum and classical forces. For scalability to high-dimensional systems (hundreds of degrees of freedom), the quantum force is computed within the Linearized Quantum Force (LQF) approximation. The classical force is determined from the ES calculations, performed at the Density Functional Tight Binding (DFTB) level. High throughput DFTB version is implemented in a massively parallel environment using Open MP/MPI. The dynamics has also been extended to describe the Boltzmann (imaginary-time) evolution defining temperature of a molecular system. The combined QTES-DFTB code has been used to study reaction dynamics of systems consisting of up to 111 atoms.
{"title":"The hybrid Quantum Trajectory/Electronic Structure DFTB-based approach to Molecular Dynamics","authors":"Lei Wang, James W. Mazzuca, Sophya Garashchuk, J. Jakowski","doi":"10.1145/2616498.2616503","DOIUrl":"https://doi.org/10.1145/2616498.2616503","url":null,"abstract":"This paper describes a quantum trajectory (QT) approach to molecular dynamics with quantum corrections on behavior of the nuclei interfaced with the on-the-fly evaluation of electronic structure (ES). Nuclear wavefunction is represented by an ensemble of trajectories, concurrently propagated in time under the influence of the quantum and classical forces. For scalability to high-dimensional systems (hundreds of degrees of freedom), the quantum force is computed within the Linearized Quantum Force (LQF) approximation. The classical force is determined from the ES calculations, performed at the Density Functional Tight Binding (DFTB) level. High throughput DFTB version is implemented in a massively parallel environment using Open MP/MPI. The dynamics has also been extended to describe the Boltzmann (imaginary-time) evolution defining temperature of a molecular system. The combined QTES-DFTB code has been used to study reaction dynamics of systems consisting of up to 111 atoms.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"37 1","pages":"24:1-24:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89655409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henry Neeman, David Akin, Joshua Alexander, D. Brunson, S. P. Calhoun, James Deaton, Franklin Fondjo Fotou, Brandon George, Debi Gentis, Zane Gray, Eddie Huebsch, George Louthan, Matt Runion, J. Snow, Brett Zimmerman
The OneOklahoma Friction Free Network (OFFN) is a dedicated multi-institutional research-only "Science DMZ" network that connects the state's academic Cyberinfrastructure resources -- including all four high performance computing centers -- that is available for use by all Oklahoma academics plus their collaborators. A project of the OneOklahoma Cyberinfrastructure Initiative (OneOCII), OFFN is based on a collaboration of three universities, a nonprofit, and Oklahoma's research, education and government Regional Optical Network. OFFN consists of common configurations of Software Defined Networking infrastructure connected across a new set of optical links, at a minimum of 10 Gbps, and foreshadowing the state's transition to widespread 100 Gbps research connectivity. OneOCII, the parent initiative of OFFN, is a statewide collaboration to offer shared access to resources, both technology and human, to enable the use of advanced computing by research and education statewide. To date, OneOCII has served 52 academic institutions and 48 non-academic organizations.
{"title":"The OneOklahoma Friction Free Network: Towards a Multi-Institutional Science DMZ in an EPSCoR State","authors":"Henry Neeman, David Akin, Joshua Alexander, D. Brunson, S. P. Calhoun, James Deaton, Franklin Fondjo Fotou, Brandon George, Debi Gentis, Zane Gray, Eddie Huebsch, George Louthan, Matt Runion, J. Snow, Brett Zimmerman","doi":"10.1145/2616498.2616542","DOIUrl":"https://doi.org/10.1145/2616498.2616542","url":null,"abstract":"The OneOklahoma Friction Free Network (OFFN) is a dedicated multi-institutional research-only \"Science DMZ\" network that connects the state's academic Cyberinfrastructure resources -- including all four high performance computing centers -- that is available for use by all Oklahoma academics plus their collaborators. A project of the OneOklahoma Cyberinfrastructure Initiative (OneOCII), OFFN is based on a collaboration of three universities, a nonprofit, and Oklahoma's research, education and government Regional Optical Network. OFFN consists of common configurations of Software Defined Networking infrastructure connected across a new set of optical links, at a minimum of 10 Gbps, and foreshadowing the state's transition to widespread 100 Gbps research connectivity. OneOCII, the parent initiative of OFFN, is a statewide collaboration to offer shared access to resources, both technology and human, to enable the use of advanced computing by research and education statewide. To date, OneOCII has served 52 academic institutions and 48 non-academic organizations.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"48 1","pages":"49:1-49:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82642613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.
{"title":"Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight","authors":"Philip D. Blood, Shoshana Marcus, M. Schatz","doi":"10.1145/2616498.2616502","DOIUrl":"https://doi.org/10.1145/2616498.2616502","url":null,"abstract":"Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"82 1","pages":"20:1-20:6"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76047622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)