Proceedings. IEEE International Conference on Cluster Computing最新文献

英文中文

MPI in 2002: has it been ten years already? 2002年的MPI:已经十年了吗?

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137776

E. Lusk

Summary form only given. In April of 1992, a group of parallel computing vendors, computer science researchers, and application scientists met at a one-day workshop and agreed to cooperate on the development of a community standard for the message-passing model of parallel computing. The MPI Forum that eventually emerged from that workshop became a model of how a broad community could work together to improve an important component of the high performance computing environment. The Message Passing Interface (MPI) definition that resulted from this effort has been widely adopted and implemented, and is now virtually synonymous with the message-passing model itself MPI not only standardized existing practice in the service of making applications portable in the rapidly changing world of parallel computing, but also consolidated research advances into novel features that extended existing practice and have proven useful in developing a new generation of applications. This talk will discuss some of the procedures and approaches of the MPI Forum that led to MPI's early adoption, and then describe some of the features that have led to its persistence as a reference model for parallel computing. Although clusters were only just emerging as a significant parallel computing production platform as MPI was being defined, MPI has proven to be a useful way of programming them for high performance, and we will discuss the current situation in MPI implementations for clusters. MPI was deliberately designed to grant considerable flexibility to implementors, and thus provides a useful framework for implementation research. Successful implementation techniques within the MPI standard can be utilized immediately by applications already using MPI, thus providing an unusually fast path front research results to their application. At Argonne National Laboratory we have been developing and distributing MPICH, a portable, high performance implementation of MPI, from the very beginning of the MPI effort. We will describe MPICH-2, a completely new version of MPICH just being released. We will present some of its novel design features that we hope will stimulate both further research and a new generation of complete MPI-2 implementations, along with some early performance results. We will conclude with a speculative look at the future of MPI, including its role in other programming approaches, fault tolerance, and its applicability to advanced architectures.

只提供摘要形式。1992年4月，一群并行计算供应商、计算机科学研究人员和应用科学家在一个为期一天的研讨会上会面，并同意合作开发并行计算的消息传递模型的社区标准。最终从该研讨会中产生的MPI论坛成为一个广泛的社区如何共同努力改进高性能计算环境的一个重要组成部分的模型。这项工作产生的消息传递接口(MPI)定义已被广泛采用和实现，现在几乎与消息传递模型本身同义。MPI不仅标准化了使应用程序在快速变化的并行计算世界中可移植的现有实践，而且还将研究进展整合为扩展现有实践的新特性，并已被证明对开发新一代应用程序很有用。本演讲将讨论导致MPI早期采用的MPI论坛的一些程序和方法，然后描述导致其作为并行计算参考模型的持久性的一些特性。尽管在MPI被定义时，集群才刚刚成为一个重要的并行计算生产平台，但MPI已被证明是一种对集群进行高性能编程的有用方法，我们将讨论集群MPI实现的现状。MPI被刻意设计为赋予实现者相当大的灵活性，从而为实现研究提供了一个有用的框架。MPI标准中的成功实现技术可以立即被已经使用MPI的应用程序利用，从而为其应用程序提供异常快速的路径前沿研究结果。在阿贡国家实验室，我们一直在开发和分发MPICH，这是一种便携式，高性能的MPI实现，从MPI努力的一开始。我们将介绍刚刚发布的MPICH的全新版本MPICH-2。我们将介绍它的一些新颖的设计特性，我们希望这些特性将刺激进一步的研究和新一代完整的MPI-2实现，以及一些早期的性能结果。最后，我们将推测MPI的未来，包括它在其他编程方法中的作用、容错性以及对高级体系结构的适用性。

{"title":"MPI in 2002: has it been ten years already?","authors":"E. Lusk","doi":"10.1109/CLUSTR.2002.1137776","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137776","url":null,"abstract":"Summary form only given. In April of 1992, a group of parallel computing vendors, computer science researchers, and application scientists met at a one-day workshop and agreed to cooperate on the development of a community standard for the message-passing model of parallel computing. The MPI Forum that eventually emerged from that workshop became a model of how a broad community could work together to improve an important component of the high performance computing environment. The Message Passing Interface (MPI) definition that resulted from this effort has been widely adopted and implemented, and is now virtually synonymous with the message-passing model itself MPI not only standardized existing practice in the service of making applications portable in the rapidly changing world of parallel computing, but also consolidated research advances into novel features that extended existing practice and have proven useful in developing a new generation of applications. This talk will discuss some of the procedures and approaches of the MPI Forum that led to MPI's early adoption, and then describe some of the features that have led to its persistence as a reference model for parallel computing. Although clusters were only just emerging as a significant parallel computing production platform as MPI was being defined, MPI has proven to be a useful way of programming them for high performance, and we will discuss the current situation in MPI implementations for clusters. MPI was deliberately designed to grant considerable flexibility to implementors, and thus provides a useful framework for implementation research. Successful implementation techniques within the MPI standard can be utilized immediately by applications already using MPI, thus providing an unusually fast path front research results to their application. At Argonne National Laboratory we have been developing and distributing MPICH, a portable, high performance implementation of MPI, from the very beginning of the MPI effort. We will describe MPICH-2, a completely new version of MPICH just being released. We will present some of its novel design features that we hope will stimulate both further research and a new generation of complete MPI-2 implementations, along with some early performance results. We will conclude with a speculative look at the future of MPI, including its role in other programming approaches, fault tolerance, and its applicability to advanced architectures.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89640924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Memory mapped networks: a new deal for distributed shared memories ? the SciFS experience 内存映射网络:分布式共享内存的新协议?SciFS的体验

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137751

E. Cecchet

Distributed Shared Memories (DSM) performance has always suffered from high network latencies and software communication layers with a large overhead. Memory mapped networks such as Scalable Coherent Interface (SCI) allow to reliably access remote memory without involving the operating system. To show how DSM systems can benefit from this technology, we have developed SciFS, a DSM tightly integrated with the operating system, that exploits the high performance and the remote memory access capabilities of SCI. We first show the respective advantages of two communications techniques with SCI: programmed IO (PIO) and remote DMA (RDMA). Then, we describe how to build a scalable page transfer mechanism by mixing PIO and RDMA. Despite the lack of a broadcast mechanism with SCI, we demonstrate that it is possible to build scalable synchronization primitives using PIO. Finally, we evaluate various consistency models with scientific computing applications from the Splash benchmark. We observe that, even if the rough network performance is good, it is not sufficient to obtain acceptable results with applications that require fine grain parallelism. However, we show that memory mapped networks provide an efficient hardware support to implement software DSM systems without requiring complex relaxed consistency models. This way, DSM design can be greatly simplified using this technology.

分布式共享内存(DSM)性能一直受到高网络延迟和软件通信层的影响，并且开销很大。内存映射网络，如可扩展连贯接口(SCI)，允许在不涉及操作系统的情况下可靠地访问远程内存。为了展示DSM系统如何从这项技术中受益，我们开发了SciFS，这是一个与操作系统紧密集成的DSM，利用了SCI的高性能和远程内存访问能力。我们首先展示了SCI两种通信技术的各自优势:编程IO (PIO)和远程DMA (RDMA)。然后，我们描述了如何通过混合PIO和RDMA来构建可扩展的页面传输机制。尽管SCI缺乏广播机制，但我们证明了可以使用PIO构建可扩展的同步原语。最后，我们用来自Splash基准的科学计算应用程序评估了各种一致性模型。我们观察到，即使粗网络性能很好，对于需要细粒度并行性的应用程序，也不足以获得可接受的结果。然而，我们表明，内存映射网络为实现软件DSM系统提供了有效的硬件支持，而不需要复杂的宽松一致性模型。这样，使用该技术可以大大简化DSM设计。

{"title":"Memory mapped networks: a new deal for distributed shared memories ? the SciFS experience","authors":"E. Cecchet","doi":"10.1109/CLUSTR.2002.1137751","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137751","url":null,"abstract":"Distributed Shared Memories (DSM) performance has always suffered from high network latencies and software communication layers with a large overhead. Memory mapped networks such as Scalable Coherent Interface (SCI) allow to reliably access remote memory without involving the operating system. To show how DSM systems can benefit from this technology, we have developed SciFS, a DSM tightly integrated with the operating system, that exploits the high performance and the remote memory access capabilities of SCI. We first show the respective advantages of two communications techniques with SCI: programmed IO (PIO) and remote DMA (RDMA). Then, we describe how to build a scalable page transfer mechanism by mixing PIO and RDMA. Despite the lack of a broadcast mechanism with SCI, we demonstrate that it is possible to build scalable synchronization primitives using PIO. Finally, we evaluate various consistency models with scientific computing applications from the Splash benchmark. We observe that, even if the rough network performance is good, it is not sufficient to obtain acceptable results with applications that require fine grain parallelism. However, we show that memory mapped networks provide an efficient hardware support to implement software DSM systems without requiring complex relaxed consistency models. This way, DSM design can be greatly simplified using this technology.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85266469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Efficient barrier using remote memory operations on VIA-based clusters 在基于via的集群上使用远程内存操作的高效屏障

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137732

Rinku Gupta, V. Tipparaju, J. Nieplocha, D. Panda

Most high performance scientific applications require efficient support for collective communication. Point-to-point message-passing communication in current generation clusters are based on the Send/Recv communication model. Collective communication operations built on top of such point-to-point message-passing operations might achieve suboptimal performance. VIA and the emerging InfiniBand architecture support remote DMA operations, which allow data to be moved between the nodes with low overhead; they also allow to create and provide a logical shared memory address space across the nodes. In this paper we focus on barrier, a frequently-used collective operations. We demonstrate how RDMA write operations can be used to support an inter-node barrier in a cluster with SMP nodes. Combining this with a scheme to exploit shared memory within a SMP node, we develop a fast barrier algorithm for a cluster of SMP nodes with a cLAN VIA interconnect. Compared to current barrier algorithms using the Send/Recv communication model, the new approach is shown to reduce barrier latency on a 64 processor (32 dual nodes) system by up to 66%. These results demonstrate that high performance and scalable barrier implementations can be delivered on current and next generation VIA/Infiniband-based clusters with RDMA support.

大多数高性能科学应用都需要有效的集体通信支持。当前代集群中的点对点消息传递通信基于Send/Recv通信模型。建立在这种点对点消息传递操作之上的集体通信操作可能会实现次优性能。VIA和新兴的InfiniBand架构支持远程DMA操作，允许数据以低开销在节点之间移动;它们还允许在节点间创建和提供逻辑共享内存地址空间。本文主要研究了一种常用的集体操作——屏障。我们将演示如何使用RDMA写操作来支持具有SMP节点的集群中的节点间屏障。结合利用SMP节点内共享内存的方案，我们为具有cLAN VIA互连的SMP节点集群开发了一种快速屏障算法。与目前使用Send/Recv通信模型的屏障算法相比，新方法可以将64处理器(32个双节点)系统上的屏障延迟减少66%。这些结果表明，高性能和可扩展的屏障实现可以在支持RDMA的当前和下一代基于VIA/ infiniband的集群上实现。

{"title":"Efficient barrier using remote memory operations on VIA-based clusters","authors":"Rinku Gupta, V. Tipparaju, J. Nieplocha, D. Panda","doi":"10.1109/CLUSTR.2002.1137732","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137732","url":null,"abstract":"Most high performance scientific applications require efficient support for collective communication. Point-to-point message-passing communication in current generation clusters are based on the Send/Recv communication model. Collective communication operations built on top of such point-to-point message-passing operations might achieve suboptimal performance. VIA and the emerging InfiniBand architecture support remote DMA operations, which allow data to be moved between the nodes with low overhead; they also allow to create and provide a logical shared memory address space across the nodes. In this paper we focus on barrier, a frequently-used collective operations. We demonstrate how RDMA write operations can be used to support an inter-node barrier in a cluster with SMP nodes. Combining this with a scheme to exploit shared memory within a SMP node, we develop a fast barrier algorithm for a cluster of SMP nodes with a cLAN VIA interconnect. Compared to current barrier algorithms using the Send/Recv communication model, the new approach is shown to reduce barrier latency on a 64 processor (32 dual nodes) system by up to 66%. These results demonstrate that high performance and scalable barrier implementations can be delivered on current and next generation VIA/Infiniband-based clusters with RDMA support.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74901901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

MyVIA: a design and implementation of the high performance Virtual Interface Architecture MyVIA:一个高性能虚拟接口架构的设计和实现

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137741

Yu Chen, Xiaoge Wang, Z. Jiao, Jun Xie, Zhihui Du, Sanli Li

Virtual Interface Architecture (VIA) established a communication model with low latency and high bandwidth, and defined the standard of user-level high-performance communication specification in cluster systems. This paper analyzes the current development, principle and implementations of VIA, and presents user-level high-performance communication software, MyVIA, based on Myrinet, which is comfortable with VIA specification. The paper first describes the design principle and framework of MyVIA, then proposes new technologies of MyVIA including User TLB, continued host physical memory and varied NIC buffer, the pipelining communication based on resource and DMA chain, and physical descriptor ring. Experimental results of performance comparisons and analysis are presented; the one-way bandwidth of MyVIA for a 4 KB message is 250 MB/s, and the lowest one-way latency is 8.46 /spl mu/s, which shows that the performance of MyVIA surpassed that of other implementations of VIA.

虚拟接口体系结构(VIA)建立了低延迟、高带宽的通信模型，定义了集群系统中用户级高性能通信规范的标准。本文分析了VIA的发展现状、原理和实现，提出了一种符合VIA规范的基于Myrinet的用户级高性能通信软件MyVIA。本文首先介绍了MyVIA的设计原理和框架，然后提出了MyVIA的新技术，包括用户TLB、主机连续物理内存和可变网卡缓冲区、基于资源链和DMA链的流水线通信以及物理描述符环。给出了性能比较和分析的实验结果;对于4 KB的消息，MyVIA的单向带宽为250 MB/s，最低的单向延迟为8.46 /spl mu/s，这表明MyVIA的性能优于其他VIA实现。

引用次数: 8

The Bladed Beowulf: a cost-effective alternative to traditional Beowulfs 刀刃贝奥武夫:传统贝奥武夫的经济实惠的替代品

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137753

Wu-chun Feng, Michael S. Warren, E. Weigle

We present a new twist to the Beowulf cluster - the Bladed Beowulf. In contrast to traditional Beowulfs which typically use Intel or AMD processors, our Bladed Beowulf uses Trans-meta processors in order to keep thermal power dissipation low and reliability and density high while still achieving comparable performance to Intel- and AMD-based clusters. Given the ever increasing complexity of traditional supercomputers and Beowulf clusters; the issues of size, reliability power consumption, and ease of administration and use will be "the" issues of this decade for high-performance computing. Bigger and faster machines are simply not good enough anymore. To illustrate, we present the results of performance benchmarks on our Bladed Beowulf and introduce two performance metrics that contribute to the total cost of ownership (TCO) of a computing system - performance/power and performance/space.

我们向贝奥武夫集群呈现一个新的转折-刀锋贝奥武夫。与传统的Beowulf通常使用英特尔或AMD处理器相比，我们的blade Beowulf使用跨元处理器，以保持低热功耗，高可靠性和高密度，同时仍然达到与基于英特尔和AMD的集群相当的性能。考虑到传统超级计算机和贝奥武夫集群日益增加的复杂性;大小、可靠性、功耗以及易于管理和使用等问题将成为高性能计算这十年的“主要”问题。更大更快的机器已经不够好了。为了说明这一点，我们展示了blade Beowulf上的性能基准测试结果，并介绍了影响计算系统总拥有成本(TCO)的两个性能指标——性能/功耗和性能/空间。

引用次数: 39

A data parallel programming model based on distributed objects 一种基于分布式对象的数据并行编程模型

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137782

R. Diaconescu, R. Conradi

This paper proposes a data parallel programming model suitable for loosely synchronous, irregular applications. At the core of the model are distributed objects that express non-trivial data parallelism. Sequential objects express independent computations. The goal is to use objects to fold synchronization into data accesses and thus, free the user from concurrency aspects. Distributed objects encapsulate large data partitioned across multiple address spaces. The system classifies accesses to distributed objects as read and write. Furthermore, it uses the access patterns to maintain information about dependences across partitions. The system guarantees inter-object consistency using a relaxed update scheme. Typical access patterns uncover dependences for data on the border between partitions. Experimental results show that this approach is highly usable and efficient.

提出了一种适用于松散同步、不规则应用的数据并行编程模型。该模型的核心是表达重要数据并行性的分布式对象。顺序对象表示独立的计算。其目标是使用对象将同步整合到数据访问中，从而将用户从并发性方面解放出来。分布式对象封装跨多个地址空间分区的大数据。系统将对分布式对象的访问分为读和写。此外，它使用访问模式来维护有关跨分区依赖关系的信息。系统使用宽松的更新方案保证对象间的一致性。典型的访问模式揭示了分区边界上数据的依赖关系。实验结果表明，该方法具有较高的可用性和效率。

引用次数: 6

ZENTURIO: an experiment management system for cluster and Grid computing ZENTURIO:用于集群和网格计算的实验管理系统

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137723

R. Prodan, T. Fahringer

The need to conduct and manage large sets of experiments for scientific applications dramatically increased over the last decade. However, there is still very little tool support for this complex and tedious process. We introduce the ZENTURIO experiment management system for parameter studies, performance analysis, and software testing for cluster and Grid architectures. ZENTURIO uses the ZEN directive-based language to specify arbitrary complex program executions. ZENTURIO is designed as a collection of Grid services that comprise: (1) a registry service which supports registering and locating Grid services; (2) an experiment generator that parses files with ZEN directives and instruments applications for performance analysis and parameter studies; (3) an experiment executor that compiles and controls the execution of experiments on the target machine. A graphical user portal allows the user to control and monitor the experiments and to automatically visualise performance and output data across multiple experiments. ZENTURIO has been implemented based on Java/Jini distributed technology. It supports experiment management on cluster architectures via PBS and on Grid infrastructures through GRAM. We report results of using ZENTURIO for performance analysis of an ocean simulation application and a parameter study of a computational finance code.

在过去十年中，为科学应用进行和管理大量实验的需求急剧增加。然而，对于这个复杂而乏味的过程，仍然只有很少的工具支持。我们介绍了ZENTURIO实验管理系统，用于集群和网格架构的参数研究、性能分析和软件测试。ZENTURIO使用基于ZEN指令的语言来指定任意复杂程序的执行。ZENTURIO被设计为网格服务的集合，包括:(1)支持注册和定位网格服务的注册服务;(2)用ZEN指令解析文件的实验生成器和用于性能分析和参数研究的仪器应用程序;(三)实验执行器，编译和控制实验在目标机上的执行。图形用户门户允许用户控制和监控实验，并自动可视化多个实验的性能和输出数据。ZENTURIO是基于Java/Jini分布式技术实现的。它通过PBS支持集群架构上的实验管理，通过GRAM支持网格基础设施上的实验管理。我们报告了使用ZENTURIO进行海洋模拟应用程序的性能分析和计算金融代码的参数研究的结果。

{"title":"ZENTURIO: an experiment management system for cluster and Grid computing","authors":"R. Prodan, T. Fahringer","doi":"10.1109/CLUSTR.2002.1137723","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137723","url":null,"abstract":"The need to conduct and manage large sets of experiments for scientific applications dramatically increased over the last decade. However, there is still very little tool support for this complex and tedious process. We introduce the ZENTURIO experiment management system for parameter studies, performance analysis, and software testing for cluster and Grid architectures. ZENTURIO uses the ZEN directive-based language to specify arbitrary complex program executions. ZENTURIO is designed as a collection of Grid services that comprise: (1) a registry service which supports registering and locating Grid services; (2) an experiment generator that parses files with ZEN directives and instruments applications for performance analysis and parameter studies; (3) an experiment executor that compiles and controls the execution of experiments on the target machine. A graphical user portal allows the user to control and monitor the experiments and to automatically visualise performance and output data across multiple experiments. ZENTURIO has been implemented based on Java/Jini distributed technology. It supports experiment management on cluster architectures via PBS and on Grid infrastructures through GRAM. We report results of using ZENTURIO for performance analysis of an ocean simulation application and a parameter study of a computational finance code.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81161915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

I/O analysis and optimization for an AMR cosmology application AMR宇宙学应用程序的I/O分析和优化

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137736

Jianwei Li, W. Liao, A. Choudhary, V. Taylor

In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O facilities, the global file I/O operations were carried out by one of the allocated processors. When the number of processors becomes large, the I/O performance of this design degrades significantly due to the high communication cost and sequential file access. In this work, we present two additional I/O implementations, using MPI-IO and parallel HDF version 5, and analyze their impacts to the I/O performance for this typical AMR application. Based on the I/O patterns discovered in this application, we also discuss the interaction between user level parallel I/O operations and different parallel file systems and point out the advantages and disadvantages. The performance results presented in this work are obtained from an SGI Origin2000 using XFS, an IBM SP using GPFS, and a Linux cluster using PVFS.

在本文中，我们研究了一个使用自适应网格细化(AMR)技术进行域分解的生产宇宙学应用程序的数据访问模式和文件I/O行为。这个应用程序最初是使用分层数据格式(HDF版本4)I/O库开发的，由于HDF不提供并行I/O设施，全局文件I/O操作由分配的处理器之一执行。当处理器数量变大时，由于通信成本高和顺序文件访问，这种设计的I/O性能会显著下降。在本文中，我们介绍了另外两个使用MPI-IO和并行HDF版本5的I/O实现，并分析了它们对这个典型AMR应用程序的I/O性能的影响。基于在该应用程序中发现的I/O模式，我们还讨论了用户级并行I/O操作与不同并行文件系统之间的交互，并指出其优缺点。本文给出的性能结果来自使用XFS的SGI Origin2000、使用GPFS的IBM SP和使用PVFS的Linux集群。

{"title":"I/O analysis and optimization for an AMR cosmology application","authors":"Jianwei Li, W. Liao, A. Choudhary, V. Taylor","doi":"10.1109/CLUSTR.2002.1137736","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137736","url":null,"abstract":"In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O facilities, the global file I/O operations were carried out by one of the allocated processors. When the number of processors becomes large, the I/O performance of this design degrades significantly due to the high communication cost and sequential file access. In this work, we present two additional I/O implementations, using MPI-IO and parallel HDF version 5, and analyze their impacts to the I/O performance for this typical AMR application. Based on the I/O patterns discovered in this application, we also discuss the interaction between user level parallel I/O operations and different parallel file systems and point out the advantages and disadvantages. The performance results presented in this work are obtained from an SGI Origin2000 using XFS, an IBM SP using GPFS, and a Linux cluster using PVFS.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82922178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

COMB: a portable benchmark suite for assessing MPI overlap COMB:用于评估MPI重叠的便携式基准套件

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137785

W. Lawry, Christopher Wilson, A. Maccabe, R. Brightwell

This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communication Offload MPI-based Benchmark, or COMB, uses two methods to characterize the ability of messages to make progress concurrently, with computational processing on the host processor(s). COMB measures the relationship between MPI communication bandwidth and host CPU availability.

本文描述了一个可移植的基准套件，用于评估集群网络硬件和软件重叠MPI通信和计算的能力。基于mpi的Communication Offload Benchmark(或COMB)使用两种方法来描述消息在主机处理器上进行计算处理的同时进行进程的能力。COMB测量MPI通信带宽与主机CPU可用性之间的关系。

引用次数: 62

SilkRoad II: a multi-paradigm runtime system for cluster computing 丝路II:用于集群计算的多范式运行时系统

Proceedings. IEEE International Conference on Cluster Computing

Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137779

Liang Peng, W. Wong, C. Yuen

A parallel programming paradigm dictates the way in which an application is to be expressed. It also restricts the algorithms that may be used in the application. Unfortunately, runtime systems for parallel computing often impose a particular programming paradigm. For a wider choice of algorithms, it is desirable to support more than one paradigm. In this paper we consider SilkRoad II, a variant of the Cilk runtime system for cluster computing. What is unique about SilkRoad II is its memory model which supports multiple paradigms with the underlying software distributed shared memory. The RC-dag memory consistency model of SilkRoad II is introduced. Our experimental results show that the stronger RC-dag can achieve performance comparable to LC of Cilk while supporting a bigger set of paradigms with rather good performance.

并行编程范式规定了表达应用程序的方式。它还限制了可能在应用程序中使用的算法。不幸的是，并行计算的运行时系统通常会强制使用特定的编程范例。对于更广泛的算法选择，最好支持一个以上的范式。在本文中，我们考虑SilkRoad II，它是Cilk运行时系统的一个变体，用于集群计算。丝路II的独特之处在于它的内存模型，它支持多种范式，底层软件分布式共享内存。介绍了丝路II的RC-dag内存一致性模型。我们的实验结果表明，更强的RC-dag可以达到与Cilk的LC相当的性能，同时支持更大的范式集，并且具有相当好的性能。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. IEEE International Conference on Cluster Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀