首页 > 最新文献

Proceedings. IEEE International Conference on Cluster Computing最新文献

英文 中文
Protocol-dependent message-passing performance on Linux clusters Linux集群上依赖于协议的消息传递性能
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137746
D. Turner, Xuehua Chen
In a Linux cluster, as in any multiprocessor system, the inter-processor communication rate is the major limiting factor to its general usefulness. This research is geared toward improving the communication performance by identifying where the inefficiencies lie and trying to understand their cause. The NetPIPE utility is being used to compare the latency and throughput of all current message-passing libraries and the native software layers they run upon for a variety of hardware configurations.
在Linux集群中,就像在任何多处理器系统中一样,处理器间通信速率是限制其一般用途的主要因素。这项研究旨在通过确定低效的地方并试图了解其原因来提高通信性能。NetPIPE实用程序用于比较所有当前消息传递库的延迟和吞吐量,以及它们在各种硬件配置下运行的本机软件层。
{"title":"Protocol-dependent message-passing performance on Linux clusters","authors":"D. Turner, Xuehua Chen","doi":"10.1109/CLUSTR.2002.1137746","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137746","url":null,"abstract":"In a Linux cluster, as in any multiprocessor system, the inter-processor communication rate is the major limiting factor to its general usefulness. This research is geared toward improving the communication performance by identifying where the inefficiencies lie and trying to understand their cause. The NetPIPE utility is being used to compare the latency and throughput of all current message-passing libraries and the native software layers they run upon for a variety of hardware configurations.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"14 1","pages":"187-194"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85481978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Integrated admission and congestion control for QoS support in clusters 集成准入和拥塞控制,支持集群中的QoS
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137761
K. H. Yum, Eun Jung Kim, C. Das, Mazin S. Yousif, J. Duato
Admission and congestion control mechanisms are integral parts of any Quality of Service (QoS) design for networks that support integrated traffic. In this paper we propose an-admission control algorithm and a congestion control algorithm for clusters, which are increasingly being used in a diverse set of applications that require QoS guarantees. The uniqueness of our approach is that we develop these algorithms for wormhole-switched networks. We use QoS-capable wormhole routers and QoS-capable network interface cards (NICs), referred to as Host Channel Adapters (HCAs) in InfiniBand/spl trade/ Architecture (IBA), to evaluate the effectiveness of these algorithms. The admission control is applied at the HCAs and the routers, while the congestion control is deployed only at the HCAs. Simulation results indicate that the admission and congestion control algorithms are quite effective in delivering the assured performance. The proposed credit-based congestion control algorithm is simple and practical in that it relies on hardware already available in the HCA to regulate traffic injection.
接纳和拥塞控制机制是支持集成流量的网络的任何服务质量(QoS)设计的组成部分。在本文中,我们提出了一种用于集群的接纳控制算法和拥塞控制算法,这两种算法越来越多地用于需要QoS保证的各种应用中。我们方法的独特之处在于我们为虫洞交换网络开发了这些算法。我们使用支持qos的虫洞路由器和支持qos的网络接口卡(nic),在InfiniBand/spl贸易/架构(IBA)中称为主机通道适配器(hca),来评估这些算法的有效性。接纳控制作用于hca和路由器,拥塞控制作用于hca。仿真结果表明,接纳和拥塞控制算法在保证性能方面是非常有效的。提出的基于信用的拥塞控制算法简单实用,它依赖于HCA中已有的硬件来调节流量注入。
{"title":"Integrated admission and congestion control for QoS support in clusters","authors":"K. H. Yum, Eun Jung Kim, C. Das, Mazin S. Yousif, J. Duato","doi":"10.1109/CLUSTR.2002.1137761","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137761","url":null,"abstract":"Admission and congestion control mechanisms are integral parts of any Quality of Service (QoS) design for networks that support integrated traffic. In this paper we propose an-admission control algorithm and a congestion control algorithm for clusters, which are increasingly being used in a diverse set of applications that require QoS guarantees. The uniqueness of our approach is that we develop these algorithms for wormhole-switched networks. We use QoS-capable wormhole routers and QoS-capable network interface cards (NICs), referred to as Host Channel Adapters (HCAs) in InfiniBand/spl trade/ Architecture (IBA), to evaluate the effectiveness of these algorithms. The admission control is applied at the HCAs and the routers, while the congestion control is deployed only at the HCAs. Simulation results indicate that the admission and congestion control algorithms are quite effective in delivering the assured performance. The proposed credit-based congestion control algorithm is simple and practical in that it relies on hardware already available in the HCA to regulate traffic injection.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"40 1","pages":"325-332"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85309481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Research directions in parallel I/O for clusters 集群并行I/O的研究方向
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137777
W. Ligon
Parallel I/O remains a critical problem for cluster computing. A significant number of important applications need high performance parallel I/O and most cluster systems provide enough hardware to deliver the required performance. System software for achieving the desired goals remains in the research and development stage. A number of parallel file systems have achieved remarkable goals in one or more of several key areas related to parallel I/O, but there is still great reluctance to commit to any file system currently available. This is mostly due to the fact that these file systems do not address enough issues at once in a package that is robust enough for widespread use. Critical goals in the development of an operation parallel file system for clusters include: high performance with scalability; reliability/fault tolerance; flexible and efficient integration with parallel codes; portability. These issues give rise to problems with interfaces and semantics, in addition to specific technical problems such as distributed locking, caching, and redundancy. The next generation of parallel file systems must look beyond traditional interfaces, semantics, and implementation methods in order achieve the desired goals. Of equal importance is the issue of knowing to what extent a given file system achieves these goals. Given that no file system is likely to address all of these goals equally well, it is important to be able to measure a given file system's utility in these areas through benchmarking or other evaluation methods. We explore a few of these issues and include specific examples and a case study of the PVFS V2 team's approach to these issues.
并行I/O仍然是集群计算的一个关键问题。大量重要的应用程序需要高性能并行I/O,大多数集群系统提供足够的硬件来提供所需的性能。实现预期目标的系统软件仍处于研发阶段。许多并行文件系统已经在与并行I/O相关的几个关键领域中的一个或多个领域实现了显著的目标,但是仍然非常不愿意提交到当前可用的任何文件系统。这主要是由于这样一个事实,即这些文件系统不能在一个足够健壮、可以广泛使用的包中一次解决足够多的问题。为集群开发操作并行文件系统的关键目标包括:高性能和可伸缩性;可靠性/容错;灵活高效地与并行代码集成;可移植性。除了分布式锁定、缓存和冗余等特定技术问题外,这些问题还会引起接口和语义方面的问题。为了达到预期的目标,下一代并行文件系统必须超越传统的接口、语义和实现方法。同样重要的是了解给定文件系统在多大程度上实现了这些目标。考虑到没有文件系统可能同样地满足所有这些目标,因此能够通过基准测试或其他评估方法度量给定文件系统在这些领域的效用是很重要的。我们将探讨其中的一些问题,并包括PVFS V2团队解决这些问题的具体示例和案例研究。
{"title":"Research directions in parallel I/O for clusters","authors":"W. Ligon","doi":"10.1109/CLUSTR.2002.1137777","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137777","url":null,"abstract":"Parallel I/O remains a critical problem for cluster computing. A significant number of important applications need high performance parallel I/O and most cluster systems provide enough hardware to deliver the required performance. System software for achieving the desired goals remains in the research and development stage. A number of parallel file systems have achieved remarkable goals in one or more of several key areas related to parallel I/O, but there is still great reluctance to commit to any file system currently available. This is mostly due to the fact that these file systems do not address enough issues at once in a package that is robust enough for widespread use. Critical goals in the development of an operation parallel file system for clusters include: high performance with scalability; reliability/fault tolerance; flexible and efficient integration with parallel codes; portability. These issues give rise to problems with interfaces and semantics, in addition to specific technical problems such as distributed locking, caching, and redundancy. The next generation of parallel file systems must look beyond traditional interfaces, semantics, and implementation methods in order achieve the desired goals. Of equal importance is the issue of knowing to what extent a given file system achieves these goals. Given that no file system is likely to address all of these goals equally well, it is important to be able to measure a given file system's utility in these areas through benchmarking or other evaluation methods. We explore a few of these issues and include specific examples and a case study of the PVFS V2 team's approach to these issues.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"16 1","pages":"436-"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81702709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
MPI in 2002: has it been ten years already? 2002年的MPI:已经十年了吗?
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137776
E. Lusk
Summary form only given. In April of 1992, a group of parallel computing vendors, computer science researchers, and application scientists met at a one-day workshop and agreed to cooperate on the development of a community standard for the message-passing model of parallel computing. The MPI Forum that eventually emerged from that workshop became a model of how a broad community could work together to improve an important component of the high performance computing environment. The Message Passing Interface (MPI) definition that resulted from this effort has been widely adopted and implemented, and is now virtually synonymous with the message-passing model itself MPI not only standardized existing practice in the service of making applications portable in the rapidly changing world of parallel computing, but also consolidated research advances into novel features that extended existing practice and have proven useful in developing a new generation of applications. This talk will discuss some of the procedures and approaches of the MPI Forum that led to MPI's early adoption, and then describe some of the features that have led to its persistence as a reference model for parallel computing. Although clusters were only just emerging as a significant parallel computing production platform as MPI was being defined, MPI has proven to be a useful way of programming them for high performance, and we will discuss the current situation in MPI implementations for clusters. MPI was deliberately designed to grant considerable flexibility to implementors, and thus provides a useful framework for implementation research. Successful implementation techniques within the MPI standard can be utilized immediately by applications already using MPI, thus providing an unusually fast path front research results to their application. At Argonne National Laboratory we have been developing and distributing MPICH, a portable, high performance implementation of MPI, from the very beginning of the MPI effort. We will describe MPICH-2, a completely new version of MPICH just being released. We will present some of its novel design features that we hope will stimulate both further research and a new generation of complete MPI-2 implementations, along with some early performance results. We will conclude with a speculative look at the future of MPI, including its role in other programming approaches, fault tolerance, and its applicability to advanced architectures.
只提供摘要形式。1992年4月,一群并行计算供应商、计算机科学研究人员和应用科学家在一个为期一天的研讨会上会面,并同意合作开发并行计算的消息传递模型的社区标准。最终从该研讨会中产生的MPI论坛成为一个广泛的社区如何共同努力改进高性能计算环境的一个重要组成部分的模型。这项工作产生的消息传递接口(MPI)定义已被广泛采用和实现,现在几乎与消息传递模型本身同义。MPI不仅标准化了使应用程序在快速变化的并行计算世界中可移植的现有实践,而且还将研究进展整合为扩展现有实践的新特性,并已被证明对开发新一代应用程序很有用。本演讲将讨论导致MPI早期采用的MPI论坛的一些程序和方法,然后描述导致其作为并行计算参考模型的持久性的一些特性。尽管在MPI被定义时,集群才刚刚成为一个重要的并行计算生产平台,但MPI已被证明是一种对集群进行高性能编程的有用方法,我们将讨论集群MPI实现的现状。MPI被刻意设计为赋予实现者相当大的灵活性,从而为实现研究提供了一个有用的框架。MPI标准中的成功实现技术可以立即被已经使用MPI的应用程序利用,从而为其应用程序提供异常快速的路径前沿研究结果。在阿贡国家实验室,我们一直在开发和分发MPICH,这是一种便携式,高性能的MPI实现,从MPI努力的一开始。我们将介绍刚刚发布的MPICH的全新版本MPICH-2。我们将介绍它的一些新颖的设计特性,我们希望这些特性将刺激进一步的研究和新一代完整的MPI-2实现,以及一些早期的性能结果。最后,我们将推测MPI的未来,包括它在其他编程方法中的作用、容错性以及对高级体系结构的适用性。
{"title":"MPI in 2002: has it been ten years already?","authors":"E. Lusk","doi":"10.1109/CLUSTR.2002.1137776","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137776","url":null,"abstract":"Summary form only given. In April of 1992, a group of parallel computing vendors, computer science researchers, and application scientists met at a one-day workshop and agreed to cooperate on the development of a community standard for the message-passing model of parallel computing. The MPI Forum that eventually emerged from that workshop became a model of how a broad community could work together to improve an important component of the high performance computing environment. The Message Passing Interface (MPI) definition that resulted from this effort has been widely adopted and implemented, and is now virtually synonymous with the message-passing model itself MPI not only standardized existing practice in the service of making applications portable in the rapidly changing world of parallel computing, but also consolidated research advances into novel features that extended existing practice and have proven useful in developing a new generation of applications. This talk will discuss some of the procedures and approaches of the MPI Forum that led to MPI's early adoption, and then describe some of the features that have led to its persistence as a reference model for parallel computing. Although clusters were only just emerging as a significant parallel computing production platform as MPI was being defined, MPI has proven to be a useful way of programming them for high performance, and we will discuss the current situation in MPI implementations for clusters. MPI was deliberately designed to grant considerable flexibility to implementors, and thus provides a useful framework for implementation research. Successful implementation techniques within the MPI standard can be utilized immediately by applications already using MPI, thus providing an unusually fast path front research results to their application. At Argonne National Laboratory we have been developing and distributing MPICH, a portable, high performance implementation of MPI, from the very beginning of the MPI effort. We will describe MPICH-2, a completely new version of MPICH just being released. We will present some of its novel design features that we hope will stimulate both further research and a new generation of complete MPI-2 implementations, along with some early performance results. We will conclude with a speculative look at the future of MPI, including its role in other programming approaches, fault tolerance, and its applicability to advanced architectures.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"129 1","pages":"435-"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89640924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The Bladed Beowulf: a cost-effective alternative to traditional Beowulfs 刀刃贝奥武夫:传统贝奥武夫的经济实惠的替代品
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137753
Wu-chun Feng, Michael S. Warren, E. Weigle
We present a new twist to the Beowulf cluster - the Bladed Beowulf. In contrast to traditional Beowulfs which typically use Intel or AMD processors, our Bladed Beowulf uses Trans-meta processors in order to keep thermal power dissipation low and reliability and density high while still achieving comparable performance to Intel- and AMD-based clusters. Given the ever increasing complexity of traditional supercomputers and Beowulf clusters; the issues of size, reliability power consumption, and ease of administration and use will be "the" issues of this decade for high-performance computing. Bigger and faster machines are simply not good enough anymore. To illustrate, we present the results of performance benchmarks on our Bladed Beowulf and introduce two performance metrics that contribute to the total cost of ownership (TCO) of a computing system - performance/power and performance/space.
我们向贝奥武夫集群呈现一个新的转折-刀锋贝奥武夫。与传统的Beowulf通常使用英特尔或AMD处理器相比,我们的blade Beowulf使用跨元处理器,以保持低热功耗,高可靠性和高密度,同时仍然达到与基于英特尔和AMD的集群相当的性能。考虑到传统超级计算机和贝奥武夫集群日益增加的复杂性;大小、可靠性、功耗以及易于管理和使用等问题将成为高性能计算这十年的“主要”问题。更大更快的机器已经不够好了。为了说明这一点,我们展示了blade Beowulf上的性能基准测试结果,并介绍了影响计算系统总拥有成本(TCO)的两个性能指标——性能/功耗和性能/空间。
{"title":"The Bladed Beowulf: a cost-effective alternative to traditional Beowulfs","authors":"Wu-chun Feng, Michael S. Warren, E. Weigle","doi":"10.1109/CLUSTR.2002.1137753","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137753","url":null,"abstract":"We present a new twist to the Beowulf cluster - the Bladed Beowulf. In contrast to traditional Beowulfs which typically use Intel or AMD processors, our Bladed Beowulf uses Trans-meta processors in order to keep thermal power dissipation low and reliability and density high while still achieving comparable performance to Intel- and AMD-based clusters. Given the ever increasing complexity of traditional supercomputers and Beowulf clusters; the issues of size, reliability power consumption, and ease of administration and use will be \"the\" issues of this decade for high-performance computing. Bigger and faster machines are simply not good enough anymore. To illustrate, we present the results of performance benchmarks on our Bladed Beowulf and introduce two performance metrics that contribute to the total cost of ownership (TCO) of a computing system - performance/power and performance/space.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"60 1","pages":"245-254"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75702160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A data parallel programming model based on distributed objects 一种基于分布式对象的数据并行编程模型
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137782
R. Diaconescu, R. Conradi
This paper proposes a data parallel programming model suitable for loosely synchronous, irregular applications. At the core of the model are distributed objects that express non-trivial data parallelism. Sequential objects express independent computations. The goal is to use objects to fold synchronization into data accesses and thus, free the user from concurrency aspects. Distributed objects encapsulate large data partitioned across multiple address spaces. The system classifies accesses to distributed objects as read and write. Furthermore, it uses the access patterns to maintain information about dependences across partitions. The system guarantees inter-object consistency using a relaxed update scheme. Typical access patterns uncover dependences for data on the border between partitions. Experimental results show that this approach is highly usable and efficient.
提出了一种适用于松散同步、不规则应用的数据并行编程模型。该模型的核心是表达重要数据并行性的分布式对象。顺序对象表示独立的计算。其目标是使用对象将同步整合到数据访问中,从而将用户从并发性方面解放出来。分布式对象封装跨多个地址空间分区的大数据。系统将对分布式对象的访问分为读和写。此外,它使用访问模式来维护有关跨分区依赖关系的信息。系统使用宽松的更新方案保证对象间的一致性。典型的访问模式揭示了分区边界上数据的依赖关系。实验结果表明,该方法具有较高的可用性和效率。
{"title":"A data parallel programming model based on distributed objects","authors":"R. Diaconescu, R. Conradi","doi":"10.1109/CLUSTR.2002.1137782","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137782","url":null,"abstract":"This paper proposes a data parallel programming model suitable for loosely synchronous, irregular applications. At the core of the model are distributed objects that express non-trivial data parallelism. Sequential objects express independent computations. The goal is to use objects to fold synchronization into data accesses and thus, free the user from concurrency aspects. Distributed objects encapsulate large data partitioned across multiple address spaces. The system classifies accesses to distributed objects as read and write. Furthermore, it uses the access patterns to maintain information about dependences across partitions. The system guarantees inter-object consistency using a relaxed update scheme. Typical access patterns uncover dependences for data on the border between partitions. Experimental results show that this approach is highly usable and efficient.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"142 1","pages":"455-460"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77375216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ZENTURIO: an experiment management system for cluster and Grid computing ZENTURIO:用于集群和网格计算的实验管理系统
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137723
R. Prodan, T. Fahringer
The need to conduct and manage large sets of experiments for scientific applications dramatically increased over the last decade. However, there is still very little tool support for this complex and tedious process. We introduce the ZENTURIO experiment management system for parameter studies, performance analysis, and software testing for cluster and Grid architectures. ZENTURIO uses the ZEN directive-based language to specify arbitrary complex program executions. ZENTURIO is designed as a collection of Grid services that comprise: (1) a registry service which supports registering and locating Grid services; (2) an experiment generator that parses files with ZEN directives and instruments applications for performance analysis and parameter studies; (3) an experiment executor that compiles and controls the execution of experiments on the target machine. A graphical user portal allows the user to control and monitor the experiments and to automatically visualise performance and output data across multiple experiments. ZENTURIO has been implemented based on Java/Jini distributed technology. It supports experiment management on cluster architectures via PBS and on Grid infrastructures through GRAM. We report results of using ZENTURIO for performance analysis of an ocean simulation application and a parameter study of a computational finance code.
在过去十年中,为科学应用进行和管理大量实验的需求急剧增加。然而,对于这个复杂而乏味的过程,仍然只有很少的工具支持。我们介绍了ZENTURIO实验管理系统,用于集群和网格架构的参数研究、性能分析和软件测试。ZENTURIO使用基于ZEN指令的语言来指定任意复杂程序的执行。ZENTURIO被设计为网格服务的集合,包括:(1)支持注册和定位网格服务的注册服务;(2)用ZEN指令解析文件的实验生成器和用于性能分析和参数研究的仪器应用程序;(三)实验执行器,编译和控制实验在目标机上的执行。图形用户门户允许用户控制和监控实验,并自动可视化多个实验的性能和输出数据。ZENTURIO是基于Java/Jini分布式技术实现的。它通过PBS支持集群架构上的实验管理,通过GRAM支持网格基础设施上的实验管理。我们报告了使用ZENTURIO进行海洋模拟应用程序的性能分析和计算金融代码的参数研究的结果。
{"title":"ZENTURIO: an experiment management system for cluster and Grid computing","authors":"R. Prodan, T. Fahringer","doi":"10.1109/CLUSTR.2002.1137723","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137723","url":null,"abstract":"The need to conduct and manage large sets of experiments for scientific applications dramatically increased over the last decade. However, there is still very little tool support for this complex and tedious process. We introduce the ZENTURIO experiment management system for parameter studies, performance analysis, and software testing for cluster and Grid architectures. ZENTURIO uses the ZEN directive-based language to specify arbitrary complex program executions. ZENTURIO is designed as a collection of Grid services that comprise: (1) a registry service which supports registering and locating Grid services; (2) an experiment generator that parses files with ZEN directives and instruments applications for performance analysis and parameter studies; (3) an experiment executor that compiles and controls the execution of experiments on the target machine. A graphical user portal allows the user to control and monitor the experiments and to automatically visualise performance and output data across multiple experiments. ZENTURIO has been implemented based on Java/Jini distributed technology. It supports experiment management on cluster architectures via PBS and on Grid infrastructures through GRAM. We report results of using ZENTURIO for performance analysis of an ocean simulation application and a parameter study of a computational finance code.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"39 1","pages":"9-18"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81161915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
I/O analysis and optimization for an AMR cosmology application AMR宇宙学应用程序的I/O分析和优化
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137736
Jianwei Li, W. Liao, A. Choudhary, V. Taylor
In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O facilities, the global file I/O operations were carried out by one of the allocated processors. When the number of processors becomes large, the I/O performance of this design degrades significantly due to the high communication cost and sequential file access. In this work, we present two additional I/O implementations, using MPI-IO and parallel HDF version 5, and analyze their impacts to the I/O performance for this typical AMR application. Based on the I/O patterns discovered in this application, we also discuss the interaction between user level parallel I/O operations and different parallel file systems and point out the advantages and disadvantages. The performance results presented in this work are obtained from an SGI Origin2000 using XFS, an IBM SP using GPFS, and a Linux cluster using PVFS.
在本文中,我们研究了一个使用自适应网格细化(AMR)技术进行域分解的生产宇宙学应用程序的数据访问模式和文件I/O行为。这个应用程序最初是使用分层数据格式(HDF版本4)I/O库开发的,由于HDF不提供并行I/O设施,全局文件I/O操作由分配的处理器之一执行。当处理器数量变大时,由于通信成本高和顺序文件访问,这种设计的I/O性能会显著下降。在本文中,我们介绍了另外两个使用MPI-IO和并行HDF版本5的I/O实现,并分析了它们对这个典型AMR应用程序的I/O性能的影响。基于在该应用程序中发现的I/O模式,我们还讨论了用户级并行I/O操作与不同并行文件系统之间的交互,并指出其优缺点。本文给出的性能结果来自使用XFS的SGI Origin2000、使用GPFS的IBM SP和使用PVFS的Linux集群。
{"title":"I/O analysis and optimization for an AMR cosmology application","authors":"Jianwei Li, W. Liao, A. Choudhary, V. Taylor","doi":"10.1109/CLUSTR.2002.1137736","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137736","url":null,"abstract":"In this paper we investigate the data access patterns and file I/O behaviors of a production cosmology application that uses the adaptive mesh refinement (AMR) technique for its domain decomposition. This application was originally developed using Hierarchical Data Format (HDF version 4) I/O library and since HDF4 does not provide parallel I/O facilities, the global file I/O operations were carried out by one of the allocated processors. When the number of processors becomes large, the I/O performance of this design degrades significantly due to the high communication cost and sequential file access. In this work, we present two additional I/O implementations, using MPI-IO and parallel HDF version 5, and analyze their impacts to the I/O performance for this typical AMR application. Based on the I/O patterns discovered in this application, we also discuss the interaction between user level parallel I/O operations and different parallel file systems and point out the advantages and disadvantages. The performance results presented in this work are obtained from an SGI Origin2000 using XFS, an IBM SP using GPFS, and a Linux cluster using PVFS.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"325 1","pages":"119-126"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82922178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
COMB: a portable benchmark suite for assessing MPI overlap COMB:用于评估MPI重叠的便携式基准套件
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137785
W. Lawry, Christopher Wilson, A. Maccabe, R. Brightwell
This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communication Offload MPI-based Benchmark, or COMB, uses two methods to characterize the ability of messages to make progress concurrently, with computational processing on the host processor(s). COMB measures the relationship between MPI communication bandwidth and host CPU availability.
本文描述了一个可移植的基准套件,用于评估集群网络硬件和软件重叠MPI通信和计算的能力。基于mpi的Communication Offload Benchmark(或COMB)使用两种方法来描述消息在主机处理器上进行计算处理的同时进行进程的能力。COMB测量MPI通信带宽与主机CPU可用性之间的关系。
{"title":"COMB: a portable benchmark suite for assessing MPI overlap","authors":"W. Lawry, Christopher Wilson, A. Maccabe, R. Brightwell","doi":"10.1109/CLUSTR.2002.1137785","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137785","url":null,"abstract":"This paper describes a portable benchmark suite that assesses the ability of cluster networking hardware and software to overlap MPI communication and computation. The Communication Offload MPI-based Benchmark, or COMB, uses two methods to characterize the ability of messages to make progress concurrently, with computational processing on the host processor(s). COMB measures the relationship between MPI communication bandwidth and host CPU availability.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"60 1","pages":"472-475"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82343729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
SilkRoad II: a multi-paradigm runtime system for cluster computing 丝路II:用于集群计算的多范式运行时系统
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137779
Liang Peng, W. Wong, C. Yuen
A parallel programming paradigm dictates the way in which an application is to be expressed. It also restricts the algorithms that may be used in the application. Unfortunately, runtime systems for parallel computing often impose a particular programming paradigm. For a wider choice of algorithms, it is desirable to support more than one paradigm. In this paper we consider SilkRoad II, a variant of the Cilk runtime system for cluster computing. What is unique about SilkRoad II is its memory model which supports multiple paradigms with the underlying software distributed shared memory. The RC-dag memory consistency model of SilkRoad II is introduced. Our experimental results show that the stronger RC-dag can achieve performance comparable to LC of Cilk while supporting a bigger set of paradigms with rather good performance.
并行编程范式规定了表达应用程序的方式。它还限制了可能在应用程序中使用的算法。不幸的是,并行计算的运行时系统通常会强制使用特定的编程范例。对于更广泛的算法选择,最好支持一个以上的范式。在本文中,我们考虑SilkRoad II,它是Cilk运行时系统的一个变体,用于集群计算。丝路II的独特之处在于它的内存模型,它支持多种范式,底层软件分布式共享内存。介绍了丝路II的RC-dag内存一致性模型。我们的实验结果表明,更强的RC-dag可以达到与Cilk的LC相当的性能,同时支持更大的范式集,并且具有相当好的性能。
{"title":"SilkRoad II: a multi-paradigm runtime system for cluster computing","authors":"Liang Peng, W. Wong, C. Yuen","doi":"10.1109/CLUSTR.2002.1137779","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137779","url":null,"abstract":"A parallel programming paradigm dictates the way in which an application is to be expressed. It also restricts the algorithms that may be used in the application. Unfortunately, runtime systems for parallel computing often impose a particular programming paradigm. For a wider choice of algorithms, it is desirable to support more than one paradigm. In this paper we consider SilkRoad II, a variant of the Cilk runtime system for cluster computing. What is unique about SilkRoad II is its memory model which supports multiple paradigms with the underlying software distributed shared memory. The RC-dag memory consistency model of SilkRoad II is introduced. Our experimental results show that the stronger RC-dag can achieve performance comparable to LC of Cilk while supporting a bigger set of paradigms with rather good performance.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"19 1","pages":"443-444"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81483437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings. IEEE International Conference on Cluster Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1