首页 > 最新文献

2012 SC Companion: High Performance Computing, Networking Storage and Analysis最新文献

英文 中文
DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment DS-CUDA:在云环境中使用多个gpu的中间件
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.146
Minoru Oikawa, A. Kawai, K. Nomura, K. Yasuoka, Kazuyuki Yoshikawa, T. Narumi
GPGPU (General-purpose computing on graphics processing units) has several difficulties when used in cloud environment, such as narrow bandwidth, higher cost, and lower security, compared with computation using only CPUs. Most high performance computing applications require huge communication between nodes, and do not fit a cloud environment, since network topology and its bandwidth are not fixed and they affect the performance of the application program. However, there are some applications for which little communication is needed, such as molecular dynamics (MD) simulation with the replica exchange method (REM). For such applications, we propose DS-CUDA (Distributed-shared compute unified device architecture), a middleware to use many GPUs in a cloud environment with lower cost and higher security. It virtualizes GPUs in a cloud such that they appear to be locally installed GPUs in a client machine. Its redundant mechanism ensures reliable calculation with consumer GPUs, which reduce the cost greatly. It also enhances the security level since no data except command and data for GPUs are stored in the cloud side. REM-MD simulation with 64 GPUs showed 58 and 36 times more speed than a locally-installed GPU via InfiniBand and the Internet, respectively.
GPGPU (General-purpose computing on graphics processing unit,图形处理单元上的通用计算)在云环境中使用时,与仅使用cpu进行计算相比,存在带宽窄、成本高、安全性低等问题。大多数高性能计算应用需要在节点之间进行大量通信,并且不适合云环境,因为网络拓扑及其带宽不是固定的,并且会影响应用程序的性能。然而,也有一些应用程序几乎不需要通信,例如使用副本交换方法(REM)的分子动力学(MD)模拟。针对这样的应用,我们提出了DS-CUDA (Distributed-shared compute unified device architecture,分布式共享计算统一设备架构),这是一种在云环境中使用多个gpu的中间件,具有更低的成本和更高的安全性。它在云端虚拟化gpu,使它们看起来像是在客户端机器上本地安装的gpu。它的冗余机制保证了与消费级gpu的可靠计算,大大降低了成本。除了命令和gpu的数据,没有其他数据存储在云端,提高了安全性。使用64个GPU的REM-MD模拟的速度分别是通过InfiniBand和Internet安装的本地GPU的58倍和36倍。
{"title":"DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment","authors":"Minoru Oikawa, A. Kawai, K. Nomura, K. Yasuoka, Kazuyuki Yoshikawa, T. Narumi","doi":"10.1109/SC.Companion.2012.146","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.146","url":null,"abstract":"GPGPU (General-purpose computing on graphics processing units) has several difficulties when used in cloud environment, such as narrow bandwidth, higher cost, and lower security, compared with computation using only CPUs. Most high performance computing applications require huge communication between nodes, and do not fit a cloud environment, since network topology and its bandwidth are not fixed and they affect the performance of the application program. However, there are some applications for which little communication is needed, such as molecular dynamics (MD) simulation with the replica exchange method (REM). For such applications, we propose DS-CUDA (Distributed-shared compute unified device architecture), a middleware to use many GPUs in a cloud environment with lower cost and higher security. It virtualizes GPUs in a cloud such that they appear to be locally installed GPUs in a client machine. Its redundant mechanism ensures reliable calculation with consumer GPUs, which reduce the cost greatly. It also enhances the security level since no data except command and data for GPUs are stored in the cloud side. REM-MD simulation with 64 GPUs showed 58 and 36 times more speed than a locally-installed GPU via InfiniBand and the Internet, respectively.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"132 1","pages":"1207-1214"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80011066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Explosive Charge Blowing a Hole in a Steel Plate Animation 炸药在钢板上炸出一个洞动画
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.364
Bradley Carvey, Nathan Fabian, D. Rogers
The animation shows a simulation of an explosive charge, blowing a hold in a steel plate. The simulation data was generated on Sandia National Lab's Red Sky Supercomputer. ParaView was used to export polygonal data, which was then textured and rendered using a commercial 3d rendering package.
该动画模拟了一个爆炸装置,在钢板上吹出一个洞。模拟数据是由桑迪亚国家实验室的红天超级计算机生成的。ParaView用于导出多边形数据,然后使用商业3d渲染包对其进行纹理和渲染。
{"title":"Explosive Charge Blowing a Hole in a Steel Plate Animation","authors":"Bradley Carvey, Nathan Fabian, D. Rogers","doi":"10.1109/SC.Companion.2012.364","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.364","url":null,"abstract":"The animation shows a simulation of an explosive charge, blowing a hold in a steel plate. The simulation data was generated on Sandia National Lab's Red Sky Supercomputer. ParaView was used to export polygonal data, which was then textured and rendered using a commercial 3d rendering package.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"101 1","pages":"1576-1577"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80416795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Poster: Numeric Based Ordering for Preconditioned Conjugate Gradient 海报:预条件共轭梯度的基于数值的排序
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.309
J. Booth
The ordering of a matrix vastly impact the convergence rate of precondition conjugate gradient method. Past ordering methods focus solely on a graph representation of the sparse matrix and do not give an inside into the convergence rate that is linked to the preconditioned eigenspectrum. This work attempt to investigate how numerical based ordering may produce a better preconditioned system in terms of faster convergence.
矩阵的序对前置共轭梯度法的收敛速度有很大影响。过去的排序方法只关注稀疏矩阵的图表示,而没有给出与预条件特征谱相关联的收敛速率的内部。这项工作试图研究基于数值的排序如何在更快的收敛方面产生更好的预置系统。
{"title":"Poster: Numeric Based Ordering for Preconditioned Conjugate Gradient","authors":"J. Booth","doi":"10.1109/SC.Companion.2012.309","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.309","url":null,"abstract":"The ordering of a matrix vastly impact the convergence rate of precondition conjugate gradient method. Past ordering methods focus solely on a graph representation of the sparse matrix and do not give an inside into the convergence rate that is linked to the preconditioned eigenspectrum. This work attempt to investigate how numerical based ordering may produce a better preconditioned system in terms of faster convergence.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"91 1","pages":"1534-1534"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79963520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization 负载均衡并行GPU核外连续LOD模型可视化
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.37
Chao Peng, Peng Mi, Yong Cao
Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model with hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel Level-Of-Detail (LOD), as proposed in [1], transferring only a portion of primitives rather than the whole to the GPU is sufficient for generating a desired simplified version of the model. However, the low bandwidth in CPU-GPU communication make data-transferring a very time-consuming process that prevents users from achieving high-performance rendering of massive 3D models on a single-GPU system. This paper explores a device-level parallel design that distributes the workloads in a multi-GPU multi-display system. Our multi-GPU out-of-core uses a load-balancing method and seamlessly integrates with the parallel LOD algorithm. Our experiments show highly interactive frame rates of the “Boeing 777” airplane model that consists of over 332 million triangles and over 223 million vertices.
渲染大量3D模型被认为是一项具有挑战性的任务。由于GPU内存的大小有限,具有数亿个原语的大型模型无法适应大多数现代GPU。通过应用[1]中提出的并行细节级(LOD),仅将原语的一部分而不是全部传输到GPU,就足以生成所需的模型简化版本。然而,CPU-GPU通信的低带宽使得数据传输成为一个非常耗时的过程,这阻碍了用户在单gpu系统上实现大规模3D模型的高性能渲染。本文探讨了一种在多gpu多显示系统中分配工作负载的设备级并行设计。我们的多gpu外核使用负载平衡方法,并与并行LOD算法无缝集成。我们的实验显示了“波音777”飞机模型的高交互帧率,该模型由超过3.32亿个三角形和超过2.23亿个顶点组成。
{"title":"Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization","authors":"Chao Peng, Peng Mi, Yong Cao","doi":"10.1109/SC.Companion.2012.37","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.37","url":null,"abstract":"Rendering massive 3D models has been recognized as a challenging task. Due to the limited size of GPU memory, a massive model with hundreds of millions of primitives cannot fit into most of modern GPUs. By applying parallel Level-Of-Detail (LOD), as proposed in [1], transferring only a portion of primitives rather than the whole to the GPU is sufficient for generating a desired simplified version of the model. However, the low bandwidth in CPU-GPU communication make data-transferring a very time-consuming process that prevents users from achieving high-performance rendering of massive 3D models on a single-GPU system. This paper explores a device-level parallel design that distributes the workloads in a multi-GPU multi-display system. Our multi-GPU out-of-core uses a load-balancing method and seamlessly integrates with the parallel LOD algorithm. Our experiments show highly interactive frame rates of the “Boeing 777” airplane model that consists of over 332 million triangles and over 223 million vertices.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"34 1","pages":"215-223"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81349994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Poster: Matrix Decomposition Based Conjugate Gradient Solver for Poisson Equation 海报:基于矩阵分解的泊松方程共轭梯度求解器
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.287
Hang Liu, J. Seo, R. Mittal
Finding a fast solver for the Poisson equation is important for many scientific applications. In this work, we design and develop a matrix decomposition based Conjugate Gradient (CG) solver, which leverages Graphics Processing Unit (GPU) clusters to accelerate the calculation of the Poisson equation. Our experiments show that the new CG solver is highly scalable and achieves significant speedup over a CPU-based Multi-Grid (MG) solver.
寻找泊松方程的快速求解器对许多科学应用都是重要的。在这项工作中,我们设计并开发了一个基于矩阵分解的共轭梯度(CG)求解器,它利用图形处理单元(GPU)集群来加速泊松方程的计算。我们的实验表明,新的CG求解器具有高度可扩展性,并且比基于cpu的多网格(MG)求解器实现了显着的加速。
{"title":"Poster: Matrix Decomposition Based Conjugate Gradient Solver for Poisson Equation","authors":"Hang Liu, J. Seo, R. Mittal","doi":"10.1109/SC.Companion.2012.287","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.287","url":null,"abstract":"Finding a fast solver for the Poisson equation is important for many scientific applications. In this work, we design and develop a matrix decomposition based Conjugate Gradient (CG) solver, which leverages Graphics Processing Unit (GPU) clusters to accelerate the calculation of the Poisson equation. Our experiments show that the new CG solver is highly scalable and achieves significant speedup over a CPU-based Multi-Grid (MG) solver.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"18 1","pages":"1501-1501"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89500732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High Performance Implementation of an Econometrics and Financial Application on GPUs 基于gpu的计量经济学和金融应用的高性能实现
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.138
M. Creel, M. Zubair
In this paper, we describe a GPU based implementation for an estimator based on an indirect likelihood inference method. This method relies on simulations from a model and on nonparametric density or regression function computations. The estimation application arises in various domains such as econometrics and finance, when the model is fully specified, but too complex for estimation by maximum likelihood. We implemented the estimator on a machine with two 2.67GHz Intel Xeon X5650 processors and four NVIDIA M2090 GPU devices. We optimized the GPU code by efficient use of shared memory and registers available on the GPU devices. We compared the optimized GPU code performance with a C based sequential version of the code that was executed on the host machine. We observed a speed up factor of up to 242 with four GPU devices.
在本文中,我们描述了一个基于GPU的基于间接似然推理方法的估计器的实现。这种方法依赖于模型的模拟和非参数密度或回归函数的计算。当模型是完全指定的,但是对于最大似然估计来说过于复杂时,估计应用程序出现在诸如计量经济学和金融等各个领域。我们在一台带有两个2.67GHz Intel Xeon X5650处理器和四个NVIDIA M2090 GPU设备的机器上实现了这个估计器。我们通过有效地利用GPU设备上可用的共享内存和寄存器来优化GPU代码。我们将优化后的GPU代码性能与在主机上执行的基于C的顺序版本的代码进行了比较。我们观察到四个GPU设备的加速系数高达242。
{"title":"High Performance Implementation of an Econometrics and Financial Application on GPUs","authors":"M. Creel, M. Zubair","doi":"10.1109/SC.Companion.2012.138","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.138","url":null,"abstract":"In this paper, we describe a GPU based implementation for an estimator based on an indirect likelihood inference method. This method relies on simulations from a model and on nonparametric density or regression function computations. The estimation application arises in various domains such as econometrics and finance, when the model is fully specified, but too complex for estimation by maximum likelihood. We implemented the estimator on a machine with two 2.67GHz Intel Xeon X5650 processors and four NVIDIA M2090 GPU devices. We optimized the GPU code by efficient use of shared memory and registers available on the GPU devices. We compared the optimized GPU code performance with a C based sequential version of the code that was executed on the host machine. We observed a speed up factor of up to 242 with four GPU devices.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"os-27 1","pages":"1147-1153"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87212408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Python HPC Framework: PyTrilinos, ODIN, and Seamless Python高性能计算框架:PyTrilinos, ODIN和Seamless
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.83
K. W. Smith, W. Spotz, S. Ross-Ross
We present three Python software projects: PyTrilinos, for calling Trilinos distributed memory HPC solvers from Python; Optimized Distributed NumPy (ODIN), for distributed array computing; and Seamless, for automatic, Just-in-time compilation of Python source code. We argue that these three projects in combination provide a framework for high-performance computing in Python. They provide this framework by supplying necessary features (in the case of ODIN and Seamless) and algorithms (in the case of ODIN and PyTrilinos) for a user to develop HPC applications. Together they address the principal limitations (real or imagined) ascribed to Python when applied to high-performance computing. A high-level overview of each project is given, including brief explanations as to how these projects work in conjunction to the benefit of end users.
我们介绍了三个Python软件项目:PyTrilinos,用于从Python调用Trilinos分布式内存HPC求解器;优化的分布式NumPy (ODIN),用于分布式数组计算;无缝,用于自动,即时编译Python源代码。我们认为这三个项目结合在一起为Python中的高性能计算提供了一个框架。他们通过提供必要的功能(在ODIN和Seamless的情况下)和算法(在ODIN和PyTrilinos的情况下)为用户开发HPC应用程序提供了这个框架。它们一起解决了Python应用于高性能计算时的主要限制(真实的或想象的)。对每个项目进行了高层次的概述,包括简要说明这些项目如何协同工作以造福最终用户。
{"title":"A Python HPC Framework: PyTrilinos, ODIN, and Seamless","authors":"K. W. Smith, W. Spotz, S. Ross-Ross","doi":"10.1109/SC.Companion.2012.83","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.83","url":null,"abstract":"We present three Python software projects: PyTrilinos, for calling Trilinos distributed memory HPC solvers from Python; Optimized Distributed NumPy (ODIN), for distributed array computing; and Seamless, for automatic, Just-in-time compilation of Python source code. We argue that these three projects in combination provide a framework for high-performance computing in Python. They provide this framework by supplying necessary features (in the case of ODIN and Seamless) and algorithms (in the case of ODIN and PyTrilinos) for a user to develop HPC applications. Together they address the principal limitations (real or imagined) ascribed to Python when applied to high-performance computing. A high-level overview of each project is given, including brief explanations as to how these projects work in conjunction to the benefit of end users.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"34 1","pages":"593-599"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89436812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Improving the Communication Performance of CRESTA's Co-Design Application NEK5000 提高CRESTA协同设计应用NEK5000的通信性能
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.92
Michael Schliephake, E. Laure
In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project “Collaborative Research into Exascale Systemware, Tools & Applications” (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.
为了达到百亿亿次的性能,应用程序和系统软件的各个方面都需要进行分析和潜在的改进。欧盟FP7项目“对Exascale系统软件、工具和应用程序的合作研究”(CRESTA)使用先进仿真应用程序和系统软件以及相关开发工具的共同设计作为其迈向Exascale方法的关键要素。在本文中,我们介绍了使用高度可扩展的应用程序NEK5000进行协同设计活动的第一批结果。我们分析了NEK5000的通信结构,并提出了新的、优化的集体通信操作,这将有助于提高NEK5000的性能,并为未来HPC系统中数百万核的使用做好准备。延迟优化的通信操作在其他环境中也可以是有益的,例如,我们期望它们成为提供动态负载平衡的运行时系统的重要构建块,也在CRESTA中开发。
{"title":"Towards Improving the Communication Performance of CRESTA's Co-Design Application NEK5000","authors":"Michael Schliephake, E. Laure","doi":"10.1109/SC.Companion.2012.92","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.92","url":null,"abstract":"In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project “Collaborative Research into Exascale Systemware, Tools & Applications” (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"106 1","pages":"669-674"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87902707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems PEPPHER组合工具:基于gpu系统的应用程序的性能感知动态组合
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.97
Usman Dastgeer, Lu Li, C. Kessler
The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with metadata such as explicitly exposed tunable parameters. The goal is to compose an application from its components and variants such that, depending on the run-time context, the most suitable implementation variant will be chosen automatically for each invocation. We describe and evaluate the PEPPHER composition tool, which explores the application's components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code. With several applications, we demonstrate how the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath.
PEPPHER组件模型为同质和异构多核和多核系统(包括GPU和基于多GPU的系统)定义了一个注释本地基于C/ c++的组件的环境。对于作为组件捕获的相同计算功能,可能会提供使用各种类型的执行单元的不同顺序和显式并行实现变体,以及显式公开的可调参数等元数据。目标是由组件和变体组成应用程序,以便根据运行时上下文自动为每个调用选择最合适的实现变体。我们描述和评估PEPPHER组合工具,它探索应用程序的组件及其实现变体,生成与运行时系统交互的必要的低级代码,并协调各种代码单元的本地编译和链接,以组成整个应用程序代码。通过几个应用程序,我们演示了组合工具如何提供高级编程前端,同时有效地利用底层基于任务的PEPPHER运行时系统(StarPU)。
{"title":"The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems","authors":"Usman Dastgeer, Lu Li, C. Kessler","doi":"10.1109/SC.Companion.2012.97","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.97","url":null,"abstract":"The PEPPHER component model defines an environment for annotation of native C/C++ based components for homogeneous and heterogeneous multicore and manycore systems, including GPU and multi-GPU based systems. For the same computational functionality, captured as a component, different sequential and explicitly parallel implementation variants using various types of execution units might be provided, together with metadata such as explicitly exposed tunable parameters. The goal is to compose an application from its components and variants such that, depending on the run-time context, the most suitable implementation variant will be chosen automatically for each invocation. We describe and evaluate the PEPPHER composition tool, which explores the application's components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code. With several applications, we demonstrate how the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"2 1","pages":"711-720"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84090326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Towards Energy Efficient Data Intensive Computing Using IEEE 802.3az 使用IEEE 802.3az实现节能数据密集型计算
Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.112
Dimitar Pavlov, Joris Soeurt, P. Grosso, Zhiming Zhao, K. V. D. Veldt, Hao Zhu, C. D. Laat
Energy efficiency is an increasingly important requirement for computing and communication systems, especially with their increasing pervasiveness. The IEEE 802.3az protocol reduces the network energy consumption by turning active copper Ethernet links to a low power model when no traffic exists. However, the effect of 802.3az heavily depends on the network traffic patterns which makes system level energy optimization challenging. In clusters, distributed data intensive applications that generate heavy network traffic are common, and in turn the required network devices can consume large amounts of energy. In this research, we examined the 802.3az technology with the goal of applying it in clusters. We defined an energy budget calculator that takes energy-efficient Ethernet into account by including the energy models derived from tests of 802.3az enabled devices. The calculator is an integral tool in a global strategy to optimize the energy usage of applications in a high performance computing environment. We show a few practical examples of how real applications can better plan their execution by integrating this knowledge in their decision strategies.
能源效率是计算和通信系统日益重要的要求,特别是随着它们的日益普及。IEEE 802.3az协议通过在没有流量的情况下将活跃的铜以太网链路转换为低功耗模式来降低网络能耗。然而,802.3az的效果严重依赖于网络流量模式,这使得系统级能量优化具有挑战性。在集群中,产生大量网络流量的分布式数据密集型应用程序很常见,而所需的网络设备又会消耗大量的能量。在本研究中,我们研究了802.3az技术,目标是将其应用于集群中。我们定义了一个能源预算计算器,通过包括从支持802.3az的设备的测试中得出的能源模型,将节能以太网考虑在内。计算器是优化高性能计算环境中应用程序的能源使用的全局策略中不可或缺的工具。我们将展示一些实际示例,说明实际应用程序如何通过将这些知识集成到其决策策略中来更好地规划其执行。
{"title":"Towards Energy Efficient Data Intensive Computing Using IEEE 802.3az","authors":"Dimitar Pavlov, Joris Soeurt, P. Grosso, Zhiming Zhao, K. V. D. Veldt, Hao Zhu, C. D. Laat","doi":"10.1109/SC.Companion.2012.112","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.112","url":null,"abstract":"Energy efficiency is an increasingly important requirement for computing and communication systems, especially with their increasing pervasiveness. The IEEE 802.3az protocol reduces the network energy consumption by turning active copper Ethernet links to a low power model when no traffic exists. However, the effect of 802.3az heavily depends on the network traffic patterns which makes system level energy optimization challenging. In clusters, distributed data intensive applications that generate heavy network traffic are common, and in turn the required network devices can consume large amounts of energy. In this research, we examined the 802.3az technology with the goal of applying it in clusters. We defined an energy budget calculator that takes energy-efficient Ethernet into account by including the energy models derived from tests of 802.3az enabled devices. The calculator is an integral tool in a global strategy to optimize the energy usage of applications in a high performance computing environment. We show a few practical examples of how real applications can better plan their execution by integrating this knowledge in their decision strategies.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"1 1","pages":"806-810"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86001164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1