2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing最新文献

英文中文

Cloud-Based Machine Learning Tools for Enhanced Big Data Applications 增强大数据应用的基于云的机器学习工具

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.170

A. Cuzzocrea, E. Mumolo, P. Corona

We propose Cloud-based machine learning tools for enhanced Big Data applications, where the main idea is that of predicting the "next" workload occurring against the target Cloud infrastructure via an innovative ensemble-based approach that combine the effectiveness of different well-known classifiers in order to enhance the whole accuracy of the final classification, which is very relevant at now in the specific context of Big Data. So-called workload categorization problem plays a critical role towards improving the efficiency and the reliability of Cloud-based big data applications. Implementation-wise, our method proposes deploying Cloud entities that participate to the distributed classification approach on top of virtual machines, which represent classical "commodity" settings for Cloud-based big data applications. Preliminary experimental assessment and analysis clearly confirm the benefits deriving from our classification framework.

我们提出了用于增强大数据应用的基于云的机器学习工具，其主要思想是通过一种创新的基于集成的方法来预测针对目标云基础设施发生的“下一个”工作负载，该方法结合了不同知名分类器的有效性，以提高最终分类的整体准确性，这在目前的大数据特定背景下非常相关。所谓的工作负载分类问题对于提高基于云的大数据应用的效率和可靠性起着至关重要的作用。在实现方面，我们的方法建议在虚拟机上部署参与分布式分类方法的云实体，这代表了基于云的大数据应用程序的经典“商品”设置。初步的实验评估和分析清楚地证实了我们的分类框架所带来的好处。

引用次数: 3

Astrophysics Simulation on RSC Massively Parallel Architecture 基于RSC大规模并行体系结构的天体物理仿真

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.102

I. Kulikov, I. Chernykh, B. Glinsky, D. Weins, A. Shmelev

AstroPhi code is designed for simulation of astrophysical objects dynamics on hybrid supercomputers equipped with Intel Xenon Phi computation accelerators. New RSC PetaStream massively parallel architecture used for simulation. The results of AstroPhi acceleration for Intel Xeon Phi native and offload execution modes are presented in this paper. RSC PetaStream architecture gives possibility of astrophysical problems simulation in high resolution. AGNES simulation tool was used for scalability simulation of AstroPhi code. The are some gravitational collapse problems presented as demonstration of AstroPhi code.

AstroPhi代码是为在配备英特尔Xenon Phi计算加速器的混合超级计算机上模拟天体物理对象动力学而设计的。新的RSC PetaStream大规模并行架构用于仿真。本文给出了AstroPhi处理器在Intel Xeon Phi处理器本地和卸载执行模式下的加速结果。RSC PetaStream架构提供了高分辨率天体物理问题模拟的可能性。利用AGNES仿真工具对AstroPhi代码进行可扩展性仿真。给出了一些重力坍缩问题作为AstroPhi代码的演示。

引用次数: 7

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds 基于SR-IOV的MVAPICH2 over OpenStack:构建高性能计算云的有效方法

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.166

Jie Zhang, Xiaoyi Lu, Mark Daniel Arnold, D. Panda

Cloud Computing with Virtualization offers attractive flexibility and elasticity to deliver resources by providing a platform for consolidating complex IT resources in a scalable manner. However, efficiently running HPC applications on Cloud Computing systems is still full of challenges. One of the biggest hurdles in building efficient HPC clouds is the unsatisfactory performance offered by underlying virtualized environments, more specifically, virtualized I/O devices. Recently, Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-performance interconnects such as InfiniBand and 10GigE. Due to its near native performance for inter-node communication, many cloud systems such as Amazon EC2 have been using SR-IOV in their production environments. Nevertheless, recent studies have shown that the SR-IOV scheme lacks locality aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this paper, we propose an efficient approach to build HPC clouds based on MVAPICH2 over Open Stack with SR-IOV. We first propose an extension for Open Stack Nova system to enable the IV Shmem channel in deployed virtual machines. We further present and discuss our high-performance design of virtual machine aware MVAPICH2 library over Open Stack-based HPC Clouds. Our design can fully take advantage of high-performance SR-IOV communication for inter-node communication as well as Inter-VM Shmem (IVShmem) for intra-node communication. A comprehensive performance evaluation with micro-benchmarks and HPC applications has been conducted on an experimental Open Stack-based HPC cloud and Amazon EC2. The evaluation results on the experimental HPC cloud show that our design and extension can deliver near bare-metal performance for implementing SR-IOV-based HPC clouds with virtualization. Further, compared with the performance on EC2, our experimental HPC cloud can exhibit up to 160X, 65X, 12X improvement potential in terms of point-to-point, collective and application for future HPC clouds.

虚拟化云计算提供了一个平台，以可扩展的方式整合复杂的IT资源，从而为交付资源提供了极具吸引力的灵活性和弹性。然而，在云计算系统上高效运行HPC应用程序仍然充满挑战。构建高效HPC云的最大障碍之一是底层虚拟化环境(更具体地说，是虚拟化I/O设备)提供的令人不满意的性能。最近，单根I/O虚拟化(SR-IOV)技术在InfiniBand和10GigE等高性能互连中得到了稳步发展。由于SR-IOV在节点间通信方面的性能接近原生，许多云系统(如Amazon EC2)已经在其生产环境中使用SR-IOV。然而，最近的研究表明，SR-IOV方案缺乏对位置感知的通信支持，这导致了同一物理节点内vm间通信的性能开销。在本文中，我们提出了一种基于MVAPICH2和SR-IOV在Open Stack上构建高性能计算云的有效方法。我们首先为Open Stack Nova系统提出了一个扩展，以在已部署的虚拟机中启用IV Shmem通道。我们进一步介绍和讨论了基于Open stack的高性能计算云的虚拟机感知MVAPICH2库的高性能设计。我们的设计可以充分利用节点间通信的高性能SR-IOV通信和节点内通信的Inter-VM Shmem (IVShmem)。在基于Open stack的实验HPC云和Amazon EC2上进行了微基准测试和HPC应用的综合性能评估。在实验HPC云上的评估结果表明，我们的设计和扩展可以提供接近裸机的性能，实现基于sr - iov的虚拟化HPC云。此外，与EC2上的性能相比，我们的实验HPC云在点对点、集合体和未来HPC云的应用方面都有高达160倍、65倍、12倍的提升潜力。

{"title":"MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds","authors":"Jie Zhang, Xiaoyi Lu, Mark Daniel Arnold, D. Panda","doi":"10.1109/CCGrid.2015.166","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.166","url":null,"abstract":"Cloud Computing with Virtualization offers attractive flexibility and elasticity to deliver resources by providing a platform for consolidating complex IT resources in a scalable manner. However, efficiently running HPC applications on Cloud Computing systems is still full of challenges. One of the biggest hurdles in building efficient HPC clouds is the unsatisfactory performance offered by underlying virtualized environments, more specifically, virtualized I/O devices. Recently, Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-performance interconnects such as InfiniBand and 10GigE. Due to its near native performance for inter-node communication, many cloud systems such as Amazon EC2 have been using SR-IOV in their production environments. Nevertheless, recent studies have shown that the SR-IOV scheme lacks locality aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this paper, we propose an efficient approach to build HPC clouds based on MVAPICH2 over Open Stack with SR-IOV. We first propose an extension for Open Stack Nova system to enable the IV Shmem channel in deployed virtual machines. We further present and discuss our high-performance design of virtual machine aware MVAPICH2 library over Open Stack-based HPC Clouds. Our design can fully take advantage of high-performance SR-IOV communication for inter-node communication as well as Inter-VM Shmem (IVShmem) for intra-node communication. A comprehensive performance evaluation with micro-benchmarks and HPC applications has been conducted on an experimental Open Stack-based HPC cloud and Amazon EC2. The evaluation results on the experimental HPC cloud show that our design and extension can deliver near bare-metal performance for implementing SR-IOV-based HPC clouds with virtualization. Further, compared with the performance on EC2, our experimental HPC cloud can exhibit up to 160X, 65X, 12X improvement potential in terms of point-to-point, collective and application for future HPC clouds.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"31 1","pages":"71-80"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79333798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Confuga: Scalable Data Intensive Computing for POSIX Workflows Confuga: POSIX工作流的可扩展数据密集型计算

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.95

P. Donnelly, Nicholas L. Hazekamp, D. Thain

Today's big-data analysis systems achieve performance and scalability by requiring end users to embrace a novel programming model. This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends on complex applications written for the conventional POSIX environment. To address this gap, we introduce Conjugal, a scalable data-intensive computing system that is largely compatible with the POSIX environment. Conjugal brings together the workflow model of scientific computing with the storage architecture of other big data systems. Conjugal accepts large workflows of standard POSIX applications arranged into graphs, and then executes them in a cluster, exploiting both parallelism and data-locality. By making use of the workload structure, Conjugal is able to avoid the long-standing problems of metadata scalability and load instability found in many large scale computing and storage systems. We show that CompUSA's approach to load control offers improvements of up to 228% in cluster network utilization and 23% reductions in workflow execution time.

当今的大数据分析系统通过要求最终用户采用新颖的编程模型来实现性能和可扩展性。这种方法非常有效，其目标是在大量数据上计算相对简单的函数，但它不适合科学计算环境，因为科学计算环境依赖于为传统POSIX环境编写的复杂应用程序。为了解决这个问题，我们引入了Conjugal，这是一个可扩展的数据密集型计算系统，与POSIX环境基本兼容。Conjugal将科学计算的工作流模型与其他大数据系统的存储架构结合在一起。Conjugal接受排列成图形的标准POSIX应用程序的大型工作流，然后在集群中执行它们，利用并行性和数据局部性。通过使用这种工作负载结构，Conjugal可以避免许多大型计算和存储系统中长期存在的元数据可扩展性和负载不稳定问题。我们表明，CompUSA的负载控制方法在集群网络利用率方面提供了高达228%的改进，在工作流执行时间方面减少了23%。

{"title":"Confuga: Scalable Data Intensive Computing for POSIX Workflows","authors":"P. Donnelly, Nicholas L. Hazekamp, D. Thain","doi":"10.1109/CCGrid.2015.95","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.95","url":null,"abstract":"Today's big-data analysis systems achieve performance and scalability by requiring end users to embrace a novel programming model. This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends on complex applications written for the conventional POSIX environment. To address this gap, we introduce Conjugal, a scalable data-intensive computing system that is largely compatible with the POSIX environment. Conjugal brings together the workflow model of scientific computing with the storage architecture of other big data systems. Conjugal accepts large workflows of standard POSIX applications arranged into graphs, and then executes them in a cluster, exploiting both parallelism and data-locality. By making use of the workload structure, Conjugal is able to avoid the long-standing problems of metadata scalability and load instability found in many large scale computing and storage systems. We show that CompUSA's approach to load control offers improvements of up to 228% in cluster network utilization and 23% reductions in workflow execution time.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"15 1","pages":"392-401"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75366089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Study of the KVM CPU Performance of Open-Source Cloud Management Platforms 开源云管理平台的KVM CPU性能研究

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.103

F. Gomez-Folgar, A. García-Loureiro, T. F. Pena, J. I. Zablah, N. Seoane

Nowadays, there are several open-source solutions for building private, public and even hybrid clouds such as Eucalyptus, Apache Cloud Stack and Open Stack. KVM is one of the supported hypervisors for these cloud platforms. Different KVM configurations are being supplied by these platforms and, in some cases, a subset of CPU features are being presented to guest systems, providing a basic abstraction of the underlying CPU. One of the reasons for limiting the features of the Virtual CPU is to guarantee the guest compatibility with different hardware in heterogeneous environments. However, in a large number of situations, the cloud is deployed on an homogeneous set of hosts. In these cases, this limitation can affect the performance of applications being executed in guest systems. In this paper, we have analyzed the architecture, the KVM setup, and the performance of the Virtual Machines deployed by three popular cloud management platforms: Eucalyptus, Apache Cloud Stack and Open Stack, employing a representative set of applications.

如今，有几个开源解决方案用于构建私有云、公共云甚至混合云，如Eucalyptus、Apache Cloud Stack和Open Stack。KVM是这些云平台支持的管理程序之一。这些平台提供了不同的KVM配置，在某些情况下，CPU特性的子集被呈现给客户机系统，提供底层CPU的基本抽象。限制虚拟CPU特性的原因之一是为了保证客户机与异构环境中不同硬件的兼容性。然而，在很多情况下，云部署在一组同质的主机上。在这些情况下，此限制可能会影响在来宾系统中执行的应用程序的性能。在本文中，我们分析了三种流行的云管理平台(Eucalyptus、Apache cloud Stack和Open Stack)部署的虚拟机的架构、KVM设置和性能，并采用了一组具有代表性的应用程序。

引用次数: 7

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters 功率检查:高性能计算集群的节能检查点框架

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.169

R. Rajachandrasekar, Akshay Venkatesh, Khaled Hamidouche, D. Panda

Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this line have primarily analyzed and modeled power and energy usage during check pointing phases. Applications running on future exascale machines will be constrained by a power envelope, and it is not only important to understand the behavior of check pointing systems under such an envelope but to also adopt techniques that can leverage power capping capabilities exposed by the OS to achieve energy savings without forsaking performance. In this paper, we address the problem of marginal energy benefits with significant performance degradation due to naive application of power capping around check pointing phases by proposing a novel power-aware check pointing framework -- Power-Check. By use of data funnelling mechanisms and selective core power-capping, Power-Check makes efficient use of the I/O and CPU subsystem. Evaluations with application kernels show that Power-Check can yield as much as 48% reduction in the amount of energy consumed during a checkpoint, while improving the check pointing performance by 14%.

检查点重新启动是一种主要用于运行在HPC系统上的应用程序的反应性容错机制。虽然文献中有无数的研究分析和优化了各种检查点协议的性能和可伸缩性，但从能源或电力的角度进行的研究并不多。沿着这条路线进行的有限数量的研究主要分析和模拟了检查点阶段的电力和能源使用情况。在未来的百亿亿级机器上运行的应用程序将受到功率包络的限制，不仅要了解在这种包络下检查点系统的行为，而且要采用可以利用操作系统暴露的功率上限功能的技术，在不牺牲性能的情况下实现节能，这一点很重要。在本文中，我们通过提出一种新颖的功率感知检查指向框架- power- check，解决了由于在检查指向阶段周围天真地应用功率封顶而导致的显著性能下降的边际能源效益问题。通过使用数据漏斗机制和选择性核心功率封顶，Power-Check可以有效地利用I/O和CPU子系统。对应用程序内核的评估表明，Power-Check可以在检查点期间减少多达48%的能耗，同时将检查点性能提高14%。

{"title":"Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters","authors":"R. Rajachandrasekar, Akshay Venkatesh, Khaled Hamidouche, D. Panda","doi":"10.1109/CCGrid.2015.169","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.169","url":null,"abstract":"Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this line have primarily analyzed and modeled power and energy usage during check pointing phases. Applications running on future exascale machines will be constrained by a power envelope, and it is not only important to understand the behavior of check pointing systems under such an envelope but to also adopt techniques that can leverage power capping capabilities exposed by the OS to achieve energy savings without forsaking performance. In this paper, we address the problem of marginal energy benefits with significant performance degradation due to naive application of power capping around check pointing phases by proposing a novel power-aware check pointing framework -- Power-Check. By use of data funnelling mechanisms and selective core power-capping, Power-Check makes efficient use of the I/O and CPU subsystem. Evaluations with application kernels show that Power-Check can yield as much as 48% reduction in the amount of energy consumed during a checkpoint, while improving the check pointing performance by 14%.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"256 1","pages":"261-270"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73125767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures 在多核架构上实现高效消息传递的技术

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.68

Min Si, P. Balaji, Y. Ishikawa

Many-core architecture provides a massively parallel environment with dozens of cores and hundreds of hardware threads. Scientific application programmers are increasingly looking at ways to utilize such large numbers of lightweight cores for various programming models. Efficiently executing these models on massively parallel many-core environments is not easy, however and performance may be degraded in various ways. The first author's doctoral research focuses on exploiting the capabilities of many-core architectures on widely used MPI implementations. While application programmers have studied several approaches to achieve better parallelism and resource sharing, many of those approaches still face communication problems that degrade performance. In the thesis, we investigate the characteristics of MPI on such massively threaded architectures and propose two efficient strategies -- a multi-threaded MPI approach and a process-based asynchronous model -- to optimize MPI communication for modern scientific applications.

多核体系结构提供了具有数十个核和数百个硬件线程的大规模并行环境。科学应用程序程序员正在越来越多地寻找方法，以便为各种编程模型利用如此大量的轻量级核心。然而，在大规模并行多核环境中有效地执行这些模型并不容易，而且性能可能会以各种方式降低。第一作者的博士研究重点是在广泛使用的MPI实现上开发多核架构的功能。虽然应用程序程序员已经研究了几种方法来实现更好的并行性和资源共享，但其中许多方法仍然面临降低性能的通信问题。在本文中，我们研究了MPI在这种大规模线程架构上的特点，并提出了两种有效的策略——多线程MPI方法和基于进程的异步模型——来优化现代科学应用的MPI通信。

引用次数: 0

Eliminating the Redundancy in MapReduce-Based Entity Resolution 消除基于mapreduce的实体解析中的冗余

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.24

Cairong Yan, Yalong Song, Jian Wang, Wenjing Guo

Entity resolution is the basic operation of data quality management, and the key step to find the value of data. The parallel data processing framework based on MapReduce can deal with the challenge brought by big data. However, there exist two important issues, avoiding redundant pairs led by the multi-pass blocking method and optimizing candidate pairs based on the transitive relations of similarity. In this paper, we propose a multi-signature based parallel entity resolution method, called multi-sig-er, which supports unstructured data and structured data. Two redundancy elimination strategies are adopted to prune the candidate pairs and reduce the number of similarity computation without affecting the resolution accuracy. Experimental results on real-world datasets show that our method tends to handle large datasets and it is more suitable for complex similarity computation than simple object matching.

实体解析是数据质量管理的基本操作，是发现数据价值的关键步骤。基于MapReduce的并行数据处理框架可以应对大数据带来的挑战。然而，存在两个重要问题，即避免多通道阻塞法导致的冗余对和基于相似性传递关系的候选对优化。本文提出了一种基于多重签名的并行实体解析方法，即multi-sign -er，该方法支持非结构化数据和结构化数据。采用两种冗余消除策略，在不影响分辨率精度的前提下，对候选对进行修剪，减少相似性计算次数。在实际数据集上的实验结果表明，该方法倾向于处理大型数据集，比简单的对象匹配更适合复杂的相似度计算。

引用次数: 10

Toward Implementing Robust Support for Portals 4 Networks in MPICH 在MPICH中实现对门户网络的强大支持

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.79

Kenneth Raffenetti, Antonio J. Peña, P. Balaji

The Portals 4 network specification is a low-levelAPI for high-performance networks developed by Sandia National Laboratories, Intel Corporation, and the University of NewMexico. Portals 4 is specifically designed to support both the MPIand PGAS programming models efficiently by providing building blocks upon which to implement their particular features. In this paper we discuss our ongoing efforts to add efficient and robust support for Portals 4 networks inside MPICH, and we describe how the API semantics influenced our design. In particular, we found the lack of reliability guarantees from the Portals4 layer challenging to address. To tackle this situation, we implemented an intermediate layer - Rportals (reliable Portals), which modularizes the reliability functionality within our Portals network module for MPICH. In this paper we present theRportals design and its performance impact.

门户4网络规范是由Sandia国家实验室、Intel公司和新墨西哥大学开发的用于高性能网络的低级api。Portals 4是专门为有效地支持mpi和PGAS编程模型而设计的，通过提供构建块来实现它们的特定功能。在本文中，我们讨论了在MPICH中为portal 4网络添加高效和健壮支持的持续努力，并描述了API语义如何影响我们的设计。特别是，我们发现portal4层缺乏可靠性保证的问题很难解决。为了解决这种情况，我们实现了一个中间层——Rportals(可靠的门户)，它在我们的门户网络模块中为MPICH模块化了可靠性功能。在本文中，我们介绍了theRportals的设计及其性能影响。

引用次数: 4

Assessing Memory Access Performance of Chapel through Synthetic Benchmarks 通过综合基准评估Chapel的内存访问性能

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.157

Engin Kayraklioglu, T. El-Ghazawi

The Partitioned Global Address Space(PGAS) programming model strikes a balance between high performance and locality awareness. As a PGAS language, Chapel relieves programmers from handling details of data movement in a distributed memory environment, by presenting a flat memory space that is logically partitioned among executing entities. Traversing such a space requires address mapping to the system virtual address space, and as such, this abstraction inevitably causes major overheads during memory accesses. In this paper, we analyzed the extent of this overhead by implementing a micro benchmark to test different types of memory accesses that can be observed in Chapel. We showed that, as the locality gets exploited speedup gains up to 35x can be achieved. This was demonstrated through hand tuning, however. More productive means should be provided to deliver such performance improvement without excessively burdening programmers. Therefore, we also discuss possibilities to increase Chapel's performance through standard libraries, compiler, runtime and/or hardware support to handle different types of memory accesses more efficiently.

分区全局地址空间(PGAS)编程模型在高性能和局域意识之间取得了平衡。作为一种PGAS语言，Chapel通过在执行实体之间逻辑分区的平面内存空间，将程序员从处理分布式内存环境中数据移动的细节中解脱出来。遍历这样的空间需要将地址映射到系统虚拟地址空间，因此，这种抽象不可避免地会导致内存访问期间的主要开销。在本文中，我们通过实现一个微基准来测试在Chapel中可以观察到的不同类型的内存访问，从而分析了这种开销的程度。我们表明，随着局部性得到充分利用，加速增益可以达到35倍。然而，这是通过手动调优来证明的。应该提供更有效的方法来交付这样的性能改进，而不会给程序员带来过多的负担。因此，我们还讨论了通过标准库、编译器、运行时和/或硬件支持来提高Chapel性能的可能性，以更有效地处理不同类型的内存访问。

{"title":"Assessing Memory Access Performance of Chapel through Synthetic Benchmarks","authors":"Engin Kayraklioglu, T. El-Ghazawi","doi":"10.1109/CCGrid.2015.157","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.157","url":null,"abstract":"The Partitioned Global Address Space(PGAS) programming model strikes a balance between high performance and locality awareness. As a PGAS language, Chapel relieves programmers from handling details of data movement in a distributed memory environment, by presenting a flat memory space that is logically partitioned among executing entities. Traversing such a space requires address mapping to the system virtual address space, and as such, this abstraction inevitably causes major overheads during memory accesses. In this paper, we analyzed the extent of this overhead by implementing a micro benchmark to test different types of memory accesses that can be observed in Chapel. We showed that, as the locality gets exploited speedup gains up to 35x can be achieved. This was demonstrated through hand tuning, however. More productive means should be provided to deliver such performance improvement without excessively burdening programmers. Therefore, we also discuss possibilities to increase Chapel's performance through standard libraries, compiler, runtime and/or hardware support to handle different types of memory accesses more efficiently.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"7 1","pages":"1147-1150"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78436529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀