Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)最新文献_第7页

Best Practices for Administering a Medium Sized Cluster with Intel® Xeon Phi™ Coprocessors 管理使用Intel®Xeon Phi™协处理器的中型集群的最佳实践

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616538

Paul Peltz, Troy Baer

This work describes the best practices for configuring and managing an Intel® Xeon Phi™ cluster. The Xeon Phi presents a unique environment to the user and preparing this environment requires unique procedures. This work will outline these procedures and provide examples for HPC Administrators to utilize and then customize for their system. Considerable effort has been put forth to help researchers determine how to maximize their performance on the Xeon Phi, but little has been done for the administrators of these systems. Now that the Xeon Phis are being deployed on larger systems, there is a need for information on how to manage and deploy these systems. The information provided here will serve as a supplement to the documentation Intel provides in order to bridge the gap between workstation and cluster deployments. This work is based on the authors experiences deploying and maintaining the Beacon cluster at the University of Tennessee's Application Acceleration Center of Excellence (AACE).

本文介绍了配置和管理Intel®Xeon Phi™集群的最佳实践。Xeon Phi为用户提供了一个独特的环境，准备这个环境需要独特的程序。本文将概述这些过程，并为HPC管理员提供示例，以便他们利用这些过程，然后为他们的系统进行定制。为了帮助研究人员确定如何最大限度地提高Xeon Phi处理器的性能，已经付出了相当大的努力，但为这些系统的管理员做的却很少。现在，Xeon系统被部署在更大的系统上，因此需要关于如何管理和部署这些系统的信息。这里提供的信息将作为英特尔提供的文档的补充，以弥合工作站和集群部署之间的差距。这项工作基于作者在田纳西大学应用加速卓越中心(AACE)部署和维护Beacon集群的经验。

{"title":"Best Practices for Administering a Medium Sized Cluster with Intel® Xeon Phi™ Coprocessors","authors":"Paul Peltz, Troy Baer","doi":"10.1145/2616498.2616538","DOIUrl":"https://doi.org/10.1145/2616498.2616538","url":null,"abstract":"This work describes the best practices for configuring and managing an Intel® Xeon Phi™ cluster. The Xeon Phi presents a unique environment to the user and preparing this environment requires unique procedures. This work will outline these procedures and provide examples for HPC Administrators to utilize and then customize for their system. Considerable effort has been put forth to help researchers determine how to maximize their performance on the Xeon Phi, but little has been done for the administrators of these systems. Now that the Xeon Phis are being deployed on larger systems, there is a need for information on how to manage and deploy these systems. The information provided here will serve as a supplement to the documentation Intel provides in order to bridge the gap between workstation and cluster deployments. This work is based on the authors experiences deploying and maintaining the Beacon cluster at the University of Tennessee's Application Acceleration Center of Excellence (AACE).","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"65 1","pages":"34:1-34:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82365768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XSEDE OpenACC workshop enables Blue Waters Researchers to Accelerate Key Algorithms XSEDE OpenACC研讨会使蓝水研究人员能够加速关键算法

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616530

G. W. Arnold, M. Gajbe, S. Koric, J. Urbanic

The Blue Waters system at the National Center for Supercomputing Applications (NCSA) is the largest GPU accelerated system in the NSF's portfolio with greater than (>) 4200 Nvidia K20x accelerators and greater than (>) 22500 compute nodes overall. Using the accelerator nodes effectively is paramount to the system's success as they represent approximately 1/7 of system peak performance. As an XSEDE level 2 service provider, the system is also available to education allocations proposed by XSEDE educators and trainers. The training staff working at Pittsburgh Supercomputing Center (PSC) along with their XSEDE and Nvidia partners have offered multiple OpenACC workshops since 2012. The most recent workshop was conducted on Blue Waters hosting the hands-on sessions and it was very successful. As a direct result of working with PSC on these workshop, NCSA researchers have been able to obtain significant speedups on real-world algorithms using OpenACC in the Cray environment. In this work we will look at two key kernel codes (3D FFT kernel, Laplace 2D MPI benchmark) and the path to obtaining the observed performance gains.

美国国家超级计算应用中心(NCSA)的蓝水系统是美国国家科学基金会(NSF)投资组合中最大的GPU加速系统，拥有超过(>)4200个Nvidia K20x加速器和超过(>)22500个计算节点。有效地使用加速器节点对于系统的成功至关重要，因为它们代表了大约1/7的系统峰值性能。作为XSEDE二级服务提供商，该系统也可用于XSEDE教育工作者和培训师提出的教育拨款。自2012年以来，匹兹堡超级计算中心(PSC)的培训人员与他们的XSEDE和Nvidia合作伙伴一起提供了多次OpenACC研讨会。最近的一次研讨会是在Blue Waters主持的实践环节上进行的，非常成功。在这些研讨会上与PSC合作的直接结果是，NCSA的研究人员已经能够在Cray环境中使用OpenACC获得实际算法的显着加速。在这项工作中，我们将研究两个关键的内核代码(3D FFT内核，拉普拉斯2D MPI基准)和获得观察到的性能增益的路径。

{"title":"XSEDE OpenACC workshop enables Blue Waters Researchers to Accelerate Key Algorithms","authors":"G. W. Arnold, M. Gajbe, S. Koric, J. Urbanic","doi":"10.1145/2616498.2616530","DOIUrl":"https://doi.org/10.1145/2616498.2616530","url":null,"abstract":"The Blue Waters system at the National Center for Supercomputing Applications (NCSA) is the largest GPU accelerated system in the NSF's portfolio with greater than (>) 4200 Nvidia K20x accelerators and greater than (>) 22500 compute nodes overall. Using the accelerator nodes effectively is paramount to the system's success as they represent approximately 1/7 of system peak performance. As an XSEDE level 2 service provider, the system is also available to education allocations proposed by XSEDE educators and trainers. The training staff working at Pittsburgh Supercomputing Center (PSC) along with their XSEDE and Nvidia partners have offered multiple OpenACC workshops since 2012. The most recent workshop was conducted on Blue Waters hosting the hands-on sessions and it was very successful. As a direct result of working with PSC on these workshop, NCSA researchers have been able to obtain significant speedups on real-world algorithms using OpenACC in the Cray environment. In this work we will look at two key kernel codes (3D FFT kernel, Laplace 2D MPI benchmark) and the path to obtaining the observed performance gains.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"44 1","pages":"28:1-28:6"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77780814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Workload Aware Utilization Optimization for a Petaflop Supercomputer: Evidence Based Assessment Using Statistical Methods 千万亿次超级计算机负载感知利用率优化:基于证据的统计方法评估

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616536

Fei Xing, Haihang You

Nowadays, computing resources like supercomputers are shared by many users. Most systems are equipped with batch systems as their resource managers. From a user's perspective, the overall turnaround of each submitted job is measured by time-to-solution which consists of the sum of batch queuing time and execution time. On a busy machine, most jobs spend more time waiting in the batch queue than their real job executions. And rarely this is a topic of performance tuning and optimization of parallel computing. we propose a workload aware method systematically to predict jobs' batch queue waiting time patterns. Consequently, it will help user to optimize utilization and improve productivity. With workload data gathered from a supercomputer, we apply Bayesian framework to predict the temporal trend of long-time batch queue waiting probability. Thus, the workload of the machine not only can be predicted, we are able to provide users with a monthly updated reference chart to suggest job submission assembled with better chosen number of CPU and running time requests, which will avoid long-time waiting in batch queue. Our experiment shows that the model could make over 89% correct predictions for all cases we have tested.

如今，像超级计算机这样的计算资源被许多用户共享。大多数系统都配备了批处理系统作为它们的资源管理器。从用户的角度来看，每个提交作业的总体周转时间是通过“到解决方案的时间”来衡量的，该时间由批排队时间和执行时间之和组成。在繁忙的机器上，大多数作业在批处理队列中等待的时间比实际作业执行的时间要长。这很少是关于并行计算的性能调优和优化的主题。提出了一种工作负载感知方法，系统地预测作业的批处理队列等待时间模式。因此，它将帮助用户优化利用率和提高生产力。利用从超级计算机上采集的工作负载数据，应用贝叶斯框架预测长时间批处理队列等待概率的时间趋势。这样，不仅可以预测机器的工作负荷，我们还可以为用户提供每月更新的参考图表，以更好地选择CPU和运行时间请求的数量来建议作业提交，从而避免在批处理队列中长时间等待。我们的实验表明，对于我们测试的所有案例，该模型的预测准确率超过89%。

{"title":"Workload Aware Utilization Optimization for a Petaflop Supercomputer: Evidence Based Assessment Using Statistical Methods","authors":"Fei Xing, Haihang You","doi":"10.1145/2616498.2616536","DOIUrl":"https://doi.org/10.1145/2616498.2616536","url":null,"abstract":"Nowadays, computing resources like supercomputers are shared by many users. Most systems are equipped with batch systems as their resource managers. From a user's perspective, the overall turnaround of each submitted job is measured by time-to-solution which consists of the sum of batch queuing time and execution time. On a busy machine, most jobs spend more time waiting in the batch queue than their real job executions. And rarely this is a topic of performance tuning and optimization of parallel computing. we propose a workload aware method systematically to predict jobs' batch queue waiting time patterns. Consequently, it will help user to optimize utilization and improve productivity. With workload data gathered from a supercomputer, we apply Bayesian framework to predict the temporal trend of long-time batch queue waiting probability. Thus, the workload of the machine not only can be predicted, we are able to provide users with a monthly updated reference chart to suggest job submission assembled with better chosen number of CPU and running time requests, which will avoid long-time waiting in batch queue. Our experiment shows that the model could make over 89% correct predictions for all cases we have tested.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"30 1","pages":"50:1-50:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73949426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis FeatureSelector:一个支持xsede的大规模游戏日志分析工具

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616511

Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot

Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.

由于网络游戏数据收集量巨大且极其复杂，选择基本特征来分析海量游戏日志不仅是必要的，而且是具有挑战性的。本研究开发并实现了一种新的支持xsede的工具FeatureSelector，该工具使用高性能计算机上的并行处理技术来执行特征选择。该工具通过计算概率距离度量，基于K-L散度，量化数据集中变量之间的距离，为大规模游戏日志分析中的特征选择提供指导。该工具帮助研究人员从300多个变量中选择高质量和判别特征，并从500gb游戏日志数据集中的231个国家/地区对中选择差异最大的国家/地区对。我们的研究表明:(1)K-L散度是正确有效地选择重要特征的良好度量;(2)XSEDE支持的高性能计算平台使特征选择过程大大加快了30倍以上。除了展示FeatureSelector在使用高性能计算的跨国分析中的有效性外，本研究还强调了社会科学研究中特征选择的一些经验教训，以及在密集数据分析中应用并行处理技术的一些经验。

{"title":"FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis","authors":"Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot","doi":"10.1145/2616498.2616511","DOIUrl":"https://doi.org/10.1145/2616498.2616511","url":null,"abstract":"Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"78 1","pages":"17:1-17:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85916710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Detailed computational modeling of laminar and turbulent sooting flames 层流和湍流烟尘火焰的详细计算模型

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616509

A. Dasgupta, Somesh P. Roy, D. Haworth

This study reports development and validation of two parallel flame solvers with soot models based on the open-source computation fluid dynamics (CFD) toolbox code OpenFOAM. First, a laminar flame solver is developed and validated against experimental data. A semi-empirical two-equation soot model and a detailed soot model using a method of moments with interpolative closure (MOMIC) are implemented in the laminar flame solver. An optically thin radiation model including gray soot radiation is also implemented. Preliminary results using these models show good agreement with experimental data for the laminar axisymmetric diffusion flame studied. Second, a turbulent flame solver is developed using Reynolds-averaged equations and transported probability density function (tPDF) method. The MOMIC soot model is implemented on this turbulent solver. A sophisticated photon Monte-Carlo (PMC) model with line-by-line spectral radiation database for modeling is also implemented on the turbulent solver. The validation of the turbulent solver is under progress. Both the solvers show good scalability for a moderate-sized chemical mechanism, and can be expected to scale even more strongly when larger chemical mechanisms are used.

本研究报告了基于开源计算流体动力学(CFD)工具箱代码OpenFOAM的两个并行火焰求解器的开发和验证。首先，开发了一种层流火焰求解器，并与实验数据进行了验证。在层流火焰求解器中实现了半经验两方程烟尘模型和基于矩量插值闭包(MOMIC)的精细烟尘模型。还实现了包括灰烟辐射在内的光学薄辐射模型。用这些模型对层流轴对称扩散火焰进行了初步研究，结果与实验数据吻合较好。其次，利用雷诺平均方程和传递概率密度函数(tPDF)方法建立了紊流火焰求解器。在此紊流求解器上实现了MOMIC烟尘模型。基于逐行光谱辐射数据库的复杂光子蒙特卡罗(PMC)模型也在紊流求解器上实现。紊流求解器的验证正在进行中。对于中等规模的化学机制，这两种求解器都表现出良好的可扩展性，当使用更大的化学机制时，它们的可扩展性甚至更强。

{"title":"Detailed computational modeling of laminar and turbulent sooting flames","authors":"A. Dasgupta, Somesh P. Roy, D. Haworth","doi":"10.1145/2616498.2616509","DOIUrl":"https://doi.org/10.1145/2616498.2616509","url":null,"abstract":"This study reports development and validation of two parallel flame solvers with soot models based on the open-source computation fluid dynamics (CFD) toolbox code OpenFOAM. First, a laminar flame solver is developed and validated against experimental data. A semi-empirical two-equation soot model and a detailed soot model using a method of moments with interpolative closure (MOMIC) are implemented in the laminar flame solver. An optically thin radiation model including gray soot radiation is also implemented. Preliminary results using these models show good agreement with experimental data for the laminar axisymmetric diffusion flame studied. Second, a turbulent flame solver is developed using Reynolds-averaged equations and transported probability density function (tPDF) method. The MOMIC soot model is implemented on this turbulent solver. A sophisticated photon Monte-Carlo (PMC) model with line-by-line spectral radiation database for modeling is also implemented on the turbulent solver. The validation of the turbulent solver is under progress. Both the solvers show good scalability for a moderate-sized chemical mechanism, and can be expected to scale even more strongly when larger chemical mechanisms are used.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"14 1","pages":"12:1-12:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84369670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Runtime Pipeline Scheduling System for Heterogeneous Architectures 异构体系结构的运行时管道调度系统

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616547

Julio C. Olaya, R. Romero

Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.

异构体系结构可以通过计算密集型、数据并行操作提高应用程序的性能。即使这些体系结构可以减少应用程序的执行时间，由于中央处理器内核和图形处理器内核的内存层次结构是分开的，因此也有机会进一步提高性能。在异构架构上运行的应用程序必须在GPU全局内存中分配空间，复制输入数据，调用内核，并将结果复制到CPU内存中。该方案不重叠内存间数据传输和GPU计算，从而增加了应用程序的执行时间。本研究提出了一种具有运行时管道系统的软件架构，用于GPU输入/输出调度，该系统充当GPU计算应用程序和物理设备之间的双向接口。该系统的主要目的是通过利用设备I/O和计算重叠来减少处理器-内存性能差距的影响。使用应用程序基准测试的评估显示，相对于基线，非流式GPU执行，处理速度提高了2倍以上。

{"title":"Runtime Pipeline Scheduling System for Heterogeneous Architectures","authors":"Julio C. Olaya, R. Romero","doi":"10.1145/2616498.2616547","DOIUrl":"https://doi.org/10.1145/2616498.2616547","url":null,"abstract":"Heterogeneous architectures can improve the performance of applications with computationally intensive, data-parallel operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchy of the central processor cores and the graphics processor cores are separate. Applications executing on heterogeneous architectures must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. This scheme does not overlap inter-memory data transfers and GPU computations, thus increasing application execution time. This research presents a software architecture with a runtime pipeline system for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups above 2x with respect to baseline, non-streamed GPU execution.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"8 1","pages":"45:1-45:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89457972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Academic Torrents: A Community-Maintained Distributed Repository 学术种子:社区维护的分布式存储库

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616528

Joseph Paul Cohen, Henry Z. Lo

Fostering the free and open sharing of scientific knowledge between the scientific community and general public is the goal of Academic Torrents. At its core it is a distributed network for efficient content dissemination, connecting scientists, academic journals, readers, research groups, and many others. Leveraging the power of its peer-to-peer architecture, Academic Torrents makes science more accessible through two initiatives. The open data initiative allows researchers to share their datasets at high speeds with low bandwidth costs through the peer-to-peer network. The cooperative nature of scientific research demands access to data, but researchers face significant hurdles making their data available. The technical benefits of the Academic Torrents network allows researchers to scalably and globally distribute content, leading to its adoption by labs all around the world to disseminate and share scientific data. Academic Torrent's open access initiative uses the same technology to share open access papers between institutions and individuals. We design a connector to our network that acts as a onsite digital stack to complement the already existing physical stack curated in the same manner. Utilizing the collective resources of the academic community we eliminate the biases in the closed subscription model and the pay to publish model.

促进科学界和公众之间科学知识的自由和开放共享是学术洪流的目标。它的核心是一个高效内容传播的分布式网络，连接着科学家、学术期刊、读者、研究小组和许多其他人。学术洪流利用其点对点架构的力量，通过两项举措使科学更容易获得。开放数据计划允许研究人员通过点对点网络以低带宽成本高速共享他们的数据集。科学研究的合作性质要求获取数据，但是研究人员在获取数据方面面临着重大障碍。学术种子网络的技术优势使研究人员能够在全球范围内可扩展地分发内容，从而使世界各地的实验室都采用该网络来传播和共享科学数据。Academic Torrent的开放获取计划使用相同的技术在机构和个人之间共享开放获取论文。我们设计了一个连接网络的连接器，作为现场数字堆栈，以相同的方式补充现有的物理堆栈。利用学术界的集体资源，我们消除了封闭订阅模式和付费出版模式中的偏见。

{"title":"Academic Torrents: A Community-Maintained Distributed Repository","authors":"Joseph Paul Cohen, Henry Z. Lo","doi":"10.1145/2616498.2616528","DOIUrl":"https://doi.org/10.1145/2616498.2616528","url":null,"abstract":"Fostering the free and open sharing of scientific knowledge between the scientific community and general public is the goal of Academic Torrents. At its core it is a distributed network for efficient content dissemination, connecting scientists, academic journals, readers, research groups, and many others. Leveraging the power of its peer-to-peer architecture, Academic Torrents makes science more accessible through two initiatives. The open data initiative allows researchers to share their datasets at high speeds with low bandwidth costs through the peer-to-peer network. The cooperative nature of scientific research demands access to data, but researchers face significant hurdles making their data available. The technical benefits of the Academic Torrents network allows researchers to scalably and globally distribute content, leading to its adoption by labs all around the world to disseminate and share scientific data. Academic Torrent's open access initiative uses the same technology to share open access papers between institutions and individuals. We design a connector to our network that acts as a onsite digital stack to complement the already existing physical stack curated in the same manner. Utilizing the collective resources of the academic community we eliminate the biases in the closed subscription model and the pay to publish model.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"99 5 1","pages":"2:1-2:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87723463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

The hybrid Quantum Trajectory/Electronic Structure DFTB-based approach to Molecular Dynamics 基于混合量子轨迹/电子结构dftb的分子动力学方法

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616503

Lei Wang, James W. Mazzuca, Sophya Garashchuk, J. Jakowski

This paper describes a quantum trajectory (QT) approach to molecular dynamics with quantum corrections on behavior of the nuclei interfaced with the on-the-fly evaluation of electronic structure (ES). Nuclear wavefunction is represented by an ensemble of trajectories, concurrently propagated in time under the influence of the quantum and classical forces. For scalability to high-dimensional systems (hundreds of degrees of freedom), the quantum force is computed within the Linearized Quantum Force (LQF) approximation. The classical force is determined from the ES calculations, performed at the Density Functional Tight Binding (DFTB) level. High throughput DFTB version is implemented in a massively parallel environment using Open MP/MPI. The dynamics has also been extended to describe the Boltzmann (imaginary-time) evolution defining temperature of a molecular system. The combined QTES-DFTB code has been used to study reaction dynamics of systems consisting of up to 111 atoms.

本文描述了一种分子动力学的量子轨迹(QT)方法，该方法对原子核的行为进行了量子修正，并对电子结构(ES)进行了动态评估。核波函数是由在量子力和经典力的影响下同时在时间上传播的轨迹集合来表示的。为了可扩展到高维系统(数百个自由度)，量子力是在线性化量子力(LQF)近似内计算的。经典力由ES计算确定，在密度泛函紧密结合(DFTB)水平上执行。高吞吐量DFTB版本使用Open MP/MPI在大规模并行环境中实现。动力学也被扩展到描述定义分子系统温度的玻尔兹曼(虚时间)演化。QTES-DFTB组合代码已用于研究多达111个原子组成的体系的反应动力学。

{"title":"The hybrid Quantum Trajectory/Electronic Structure DFTB-based approach to Molecular Dynamics","authors":"Lei Wang, James W. Mazzuca, Sophya Garashchuk, J. Jakowski","doi":"10.1145/2616498.2616503","DOIUrl":"https://doi.org/10.1145/2616498.2616503","url":null,"abstract":"This paper describes a quantum trajectory (QT) approach to molecular dynamics with quantum corrections on behavior of the nuclei interfaced with the on-the-fly evaluation of electronic structure (ES). Nuclear wavefunction is represented by an ensemble of trajectories, concurrently propagated in time under the influence of the quantum and classical forces. For scalability to high-dimensional systems (hundreds of degrees of freedom), the quantum force is computed within the Linearized Quantum Force (LQF) approximation. The classical force is determined from the ES calculations, performed at the Density Functional Tight Binding (DFTB) level. High throughput DFTB version is implemented in a massively parallel environment using Open MP/MPI. The dynamics has also been extended to describe the Boltzmann (imaginary-time) evolution defining temperature of a molecular system. The combined QTES-DFTB code has been used to study reaction dynamics of systems consisting of up to 111 atoms.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"37 1","pages":"24:1-24:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89655409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The OneOklahoma Friction Free Network: Towards a Multi-Institutional Science DMZ in an EPSCoR State 一个俄克拉何马摩擦无网络:迈向多机构科学非军事区在EPSCoR状态

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616542

Henry Neeman, David Akin, Joshua Alexander, D. Brunson, S. P. Calhoun, James Deaton, Franklin Fondjo Fotou, Brandon George, Debi Gentis, Zane Gray, Eddie Huebsch, George Louthan, Matt Runion, J. Snow, Brett Zimmerman

The OneOklahoma Friction Free Network (OFFN) is a dedicated multi-institutional research-only "Science DMZ" network that connects the state's academic Cyberinfrastructure resources -- including all four high performance computing centers -- that is available for use by all Oklahoma academics plus their collaborators. A project of the OneOklahoma Cyberinfrastructure Initiative (OneOCII), OFFN is based on a collaboration of three universities, a nonprofit, and Oklahoma's research, education and government Regional Optical Network. OFFN consists of common configurations of Software Defined Networking infrastructure connected across a new set of optical links, at a minimum of 10 Gbps, and foreshadowing the state's transition to widespread 100 Gbps research connectivity. OneOCII, the parent initiative of OFFN, is a statewide collaboration to offer shared access to resources, both technology and human, to enable the use of advanced computing by research and education statewide. To date, OneOCII has served 52 academic institutions and 48 non-academic organizations.

OneOklahoma无摩擦网络(OFFN)是一个专门的多机构研究“科学非军事区”网络，它连接了该州的学术网络基础设施资源，包括所有四个高性能计算中心，可供所有俄克拉何马州的学者及其合作者使用。OFFN是OneOklahoma网络基础设施倡议(OneOCII)的一个项目，是基于三所大学、一家非营利机构以及Oklahoma的研究、教育和政府区域光网络的合作。OFFN由软件定义网络基础设施的通用配置组成，通过一组新的光链路连接，最低速率为10gbps，预示着美国将向广泛的100gbps研究连接过渡。OneOCII是OFFN的前身，是一项全州范围的合作，旨在提供对资源的共享访问，包括技术和人力资源，从而使全州的研究和教育能够使用先进的计算技术。迄今为止，OneOCII已为52个学术机构和48个非学术组织提供服务。

{"title":"The OneOklahoma Friction Free Network: Towards a Multi-Institutional Science DMZ in an EPSCoR State","authors":"Henry Neeman, David Akin, Joshua Alexander, D. Brunson, S. P. Calhoun, James Deaton, Franklin Fondjo Fotou, Brandon George, Debi Gentis, Zane Gray, Eddie Huebsch, George Louthan, Matt Runion, J. Snow, Brett Zimmerman","doi":"10.1145/2616498.2616542","DOIUrl":"https://doi.org/10.1145/2616498.2616542","url":null,"abstract":"The OneOklahoma Friction Free Network (OFFN) is a dedicated multi-institutional research-only \"Science DMZ\" network that connects the state's academic Cyberinfrastructure resources -- including all four high performance computing centers -- that is available for use by all Oklahoma academics plus their collaborators. A project of the OneOklahoma Cyberinfrastructure Initiative (OneOCII), OFFN is based on a collaboration of three universities, a nonprofit, and Oklahoma's research, education and government Regional Optical Network. OFFN consists of common configurations of Software Defined Networking infrastructure connected across a new set of optical links, at a minimum of 10 Gbps, and foreshadowing the state's transition to widespread 100 Gbps research connectivity. OneOCII, the parent initiative of OFFN, is a statewide collaboration to offer shared access to resources, both technology and human, to enable the use of advanced computing by research and education statewide. To date, OneOCII has served 52 academic institutions and 48 non-academic organizations.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"48 1","pages":"49:1-49:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82642613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight 利用黑光技术对谷物基因组进行大规模测序和组装

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616502

Philip D. Blood, Shoshana Marcus, M. Schatz

Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.

小麦、玉米和大米每天提供了世界食物摄入量的60%，仅15种植物就占了世界食物摄入量的90%。因此，对植物基因组进行测序和研究具有巨大的农业和科学兴趣，特别是开发指导植物育种或识别功能元件的参考序列。DNA测序技术现在可以低成本地生成大基因组的序列数据，然而，将短测序读段组装成完整的基因组序列仍然是一个巨大的计算挑战。即使是一种较简单的小麦祖先品种，小麦的基因组大小为4.36千兆对(Gbp)，比人类基因组大近50%。组装如此大小的基因组需要计算资源，尤其是存储大型组装图的RAM，这是大多数机构无法企及的。在本文中，我们描述了冷泉港实验室和匹兹堡超级计算中心之间的合作努力，以组装从Ae开始的大型复杂谷物基因组。tauschii，使用XSEDE共享内存超级计算机Blacklight。我们希望这些使用Blacklight的经验能为其他基因组学社区提供一个案例研究和计算协议，以利用这个或类似的资源来组装其他感兴趣的重要基因组。

{"title":"Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight","authors":"Philip D. Blood, Shoshana Marcus, M. Schatz","doi":"10.1145/2616498.2616502","DOIUrl":"https://doi.org/10.1145/2616498.2616502","url":null,"abstract":"Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"82 1","pages":"20:1-20:6"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76047622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1