首页 > 最新文献

Proceedings. IEEE International Conference on Cluster Computing最新文献

英文 中文
Community services: a toolkit for rapid deployment of network services 社区服务:快速部署网络服务的工具包
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137781
Byoung-Dai Lee, J. Weissman
Advances in packaging and interface technologies have made it possible for software components to be shared across the network through encapsulation and offered as network services. They allow end-users to focus on their applications and obtain remote services when needed simply by invoking them across the network. Many groups have built significant infrastructures for providing domain-specific high performance services. However, transforming high performance applications into network services is labor intensive and time consuming because there is little existing infrastructure to utilize. In this paper, we propose a software toolkit and runtime infrastructure for rapid deployment of network services.
封装和接口技术的进步使得软件组件通过封装在网络上共享并作为网络服务提供成为可能。它们允许最终用户专注于自己的应用程序,并在需要时通过跨网络调用远程服务来获得远程服务。许多组织已经为提供特定于领域的高性能服务构建了重要的基础设施。然而,将高性能应用程序转换为网络服务是一项劳动密集型和耗时的工作,因为可供利用的现有基础设施很少。在本文中,我们提出了一个快速部署网络服务的软件工具包和运行时基础设施。
{"title":"Community services: a toolkit for rapid deployment of network services","authors":"Byoung-Dai Lee, J. Weissman","doi":"10.1109/CLUSTR.2002.1137781","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137781","url":null,"abstract":"Advances in packaging and interface technologies have made it possible for software components to be shared across the network through encapsulation and offered as network services. They allow end-users to focus on their applications and obtain remote services when needed simply by invoking them across the network. Many groups have built significant infrastructures for providing domain-specific high performance services. However, transforming high performance applications into network services is labor intensive and time consuming because there is little existing infrastructure to utilize. In this paper, we propose a software toolkit and runtime infrastructure for rapid deployment of network services.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89124558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blades - an emerging system design model for economic delivery of high performance computing 刀片——一种新兴的系统设计模型,用于高效能计算的经济交付
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137722
Kirk M. Bresniker
Summary form only given. During the recent rise and subsequent fall of the Internet bubble, a new computer system design model emerged, primarily from venture capital start-ups. Bladed systems, dense arrays of single board computers housed in a common chassis, seemed a promising way for service providers to keep pace with the anticipated dot.com inspired build-out. The blades were dense, low bandwidth and low in computational power, but they were suited to rapid deployment of masses of content delivery. Along with other lessons learned as the 'irrational exhuberance' faded and unviable business models and their edge applications were winnowed from data centers, the designers of bladed systems began to realize that blades had the potential to move from 'edge-only' applications into high performance enterprise, communication, and technical compute. All leading manufacturers now have high performance blade designs either in design or shipping now. Key to the high performance blade are shifts in the processor, storage, networking and management technologies from those utilized in first generation blades. These shifts could enable bladed systems to delivery multi-system compute arrays at appreciably lower total cost of ownership.
只提供摘要形式。在最近互联网泡沫的兴起和随后的衰落中,一种新的计算机系统设计模式出现了,主要来自风险投资初创企业。刀片式系统,将密集的单板计算机排列在一个共同的机箱中,对于服务提供商来说,似乎是一种很有希望的方式,可以跟上预期的。com启发的建设步伐。刀片密集,带宽低,计算能力低,但它们适合于大量内容交付的快速部署。随着“非理性繁荣”逐渐消失,不可行的商业模式及其边缘应用程序从数据中心被淘汰,刀片系统的设计者开始意识到刀片系统具有从“仅限边缘”应用程序转向高性能企业、通信和技术计算的潜力。所有领先的制造商现在都有高性能的叶片设计,无论是在设计还是出货。高性能刀片的关键是处理器、存储、网络和管理技术的转变,这些技术在第一代刀片中使用。这些转变可以使刀片系统以明显较低的总拥有成本交付多系统计算阵列。
{"title":"Blades - an emerging system design model for economic delivery of high performance computing","authors":"Kirk M. Bresniker","doi":"10.1109/CLUSTR.2002.1137722","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137722","url":null,"abstract":"Summary form only given. During the recent rise and subsequent fall of the Internet bubble, a new computer system design model emerged, primarily from venture capital start-ups. Bladed systems, dense arrays of single board computers housed in a common chassis, seemed a promising way for service providers to keep pace with the anticipated dot.com inspired build-out. The blades were dense, low bandwidth and low in computational power, but they were suited to rapid deployment of masses of content delivery. Along with other lessons learned as the 'irrational exhuberance' faded and unviable business models and their edge applications were winnowed from data centers, the designers of bladed systems began to realize that blades had the potential to move from 'edge-only' applications into high performance enterprise, communication, and technical compute. All leading manufacturers now have high performance blade designs either in design or shipping now. Key to the high performance blade are shifts in the processor, storage, networking and management technologies from those utilized in first generation blades. These shifts could enable bladed systems to delivery multi-system compute arrays at appreciably lower total cost of ownership.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83329419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interconnects: which one is right for you? 互联:哪一个适合你?
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137764
Brett M. Bode
Over the past few years cluster computers have become commonplace. During that time the interconnect choices have gotten more numerous. For the early clusters the obvious choice was Fast Ethernet. However, today there are several options including Gigabit Ethernet, Myrinet, SCI and others. Which one is right for your cluster will depend on many factors including cost, cluster size, latency and bandwidth needs. We have examined the currently available interconnects and will present performance results based on both raw bandwidth measurements and application scalability. Finally we will examine the pros and cons of each interconnect and make recommendations as to what type of cluster/application for which each interconnect is best suited.
在过去的几年里,集群计算机已经变得司空见惯。在此期间,互连的选择变得越来越多。对于早期的集群,明显的选择是快速以太网。然而,今天有几种选择,包括千兆以太网,Myrinet, SCI等。哪种方法适合您的集群取决于许多因素,包括成本、集群大小、延迟和带宽需求。我们已经检查了当前可用的互连,并将根据原始带宽测量和应用程序可伸缩性提供性能结果。最后,我们将研究每种互连的优缺点,并就每种互连最适合哪种类型的集群/应用程序提出建议。
{"title":"Interconnects: which one is right for you?","authors":"Brett M. Bode","doi":"10.1109/CLUSTR.2002.1137764","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137764","url":null,"abstract":"Over the past few years cluster computers have become commonplace. During that time the interconnect choices have gotten more numerous. For the early clusters the obvious choice was Fast Ethernet. However, today there are several options including Gigabit Ethernet, Myrinet, SCI and others. Which one is right for your cluster will depend on many factors including cost, cluster size, latency and bandwidth needs. We have examined the currently available interconnects and will present performance results based on both raw bandwidth measurements and application scalability. Finally we will examine the pros and cons of each interconnect and make recommendations as to what type of cluster/application for which each interconnect is best suited.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CLUSTR.2002.1137764","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72440102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goals guiding design: PVM and MPI 指导设计的目标:PVM和MPI
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137754
W. Gropp, E. Lusk
PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of network-connected systems.
PVM和MPI这两种用于编程集群的系统经常被比较。比较通常从一个不言而喻的假设开始,即PVM和MPI代表同一问题的不同解决方案。在本文中,我们表明,事实上,这两个系统往往是解决不同的问题。如果问题确实匹配,但PVM和MPI选择的解决方案不同,我们将解释差异的原因。通常,这些差异可以追溯到两个系统的目标、它们的起源,或者它们的规范和实现之间的关系的显式差异。例如,我们展示了跨许多平台的可移植性和性能需求导致MPI选择了与PVM不同的方法,PVM能够利用网络连接系统的相似性。
{"title":"Goals guiding design: PVM and MPI","authors":"W. Gropp, E. Lusk","doi":"10.1109/CLUSTR.2002.1137754","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137754","url":null,"abstract":"PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of network-connected systems.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81218691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
An extensible, portable, scalable cluster management software architecture 一个可扩展的、可移植的、可伸缩的集群管理软件体系结构
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137757
J. Laros, L. Ward, Nathan W. Dauchy, R. Brightwell, Trammell Hudson, Ruth Klundt
This paper describes an object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability. This architecture has been successfully implemented and deployed on several large-scale production clusters at Sandia National Laboratories, the largest of which is currently 1861 nodes. This paper discusses the key features of the architecture that allow for easily extending the range of supported hardware devices and network topologies. We also describe in detail how the object-oriented structure that represents the hardware components can be used to implement scalable and portable cluster management tools.
本文描述了用于集群集成和管理的面向对象的软件体系结构,该体系结构支持可扩展性、可移植性和可伸缩性。该架构已经在桑迪亚国家实验室的几个大规模生产集群上成功实施和部署,目前最大的集群有1861个节点。本文讨论了该体系结构的关键特性,这些特性允许轻松扩展支持的硬件设备和网络拓扑的范围。我们还详细描述了如何使用表示硬件组件的面向对象结构来实现可伸缩和可移植的集群管理工具。
{"title":"An extensible, portable, scalable cluster management software architecture","authors":"J. Laros, L. Ward, Nathan W. Dauchy, R. Brightwell, Trammell Hudson, Ruth Klundt","doi":"10.1109/CLUSTR.2002.1137757","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137757","url":null,"abstract":"This paper describes an object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability. This architecture has been successfully implemented and deployed on several large-scale production clusters at Sandia National Laboratories, the largest of which is currently 1861 nodes. This paper discusses the key features of the architecture that allow for easily extending the range of supported hardware devices and network topologies. We also describe in detail how the object-oriented structure that represents the hardware components can be used to implement scalable and portable cluster management tools.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81127332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
High performance user level sockets over Gigabit Ethernet 千兆以太网上的高性能用户级套接字
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137745
P. Balaji, Piyush Shivam, P. Wyckoff, D. Panda
While a number of user-level protocols have been developed to reduce the gap between the performance capabilities of the physical network and the performance actually available, applications that have already been developed on kernel based protocols such as TCP have largely been ignored. There is a need to make these existing TCP applications take advantage of the modern user-level protocols such as EMP or VIA which feature both low-latency and high bandwidth. We have designed, implemented and evaluated a scheme to support such applications written using the sockets API to run over EMP without any changes to the application itself. Using this scheme, we are able to achieve a latency of 28.5 /spl mu/s for the Datagram sockets and 37 /spl mu/s for Data Streaming sockets compared to a latency of 120 /spl mu/s obtained by TCP for 4-byte messages. This scheme attains a peak bandwidth of around 840 Mbps. Both the latency and the throughput numbers are close to those achievable by EMP. The ftp application shows twice as much benefit on our sockets interface while the Web server application shows up to six times performance enhancement as compared to TCP. To the best of our knowledge, this is the first such design and implementation for Gigabit Ethernet.
虽然已经开发了许多用户级协议来缩小物理网络的性能能力与实际可用性能之间的差距,但是已经在基于内核的协议(如TCP)上开发的应用程序在很大程度上被忽略了。有必要使这些现有的TCP应用程序利用现代用户级协议,如EMP或VIA,它们具有低延迟和高带宽的特点。我们已经设计、实现和评估了一个方案,以支持使用套接字API编写的应用程序在EMP上运行,而不需要对应用程序本身进行任何更改。使用此方案,我们能够实现数据报套接字的28.5 /spl mu/s的延迟和数据流套接字的37 /spl mu/s的延迟,而TCP为4字节消息获得的延迟为120 /spl mu/s。该方案的峰值带宽约为840mbps。延迟和吞吐量都接近于EMP所能达到的水平,ftp应用程序在我们的套接字接口上显示了两倍的好处,而Web服务器应用程序的性能比TCP提高了六倍。据我们所知,这是千兆以太网的第一个这样的设计和实现。
{"title":"High performance user level sockets over Gigabit Ethernet","authors":"P. Balaji, Piyush Shivam, P. Wyckoff, D. Panda","doi":"10.1109/CLUSTR.2002.1137745","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137745","url":null,"abstract":"While a number of user-level protocols have been developed to reduce the gap between the performance capabilities of the physical network and the performance actually available, applications that have already been developed on kernel based protocols such as TCP have largely been ignored. There is a need to make these existing TCP applications take advantage of the modern user-level protocols such as EMP or VIA which feature both low-latency and high bandwidth. We have designed, implemented and evaluated a scheme to support such applications written using the sockets API to run over EMP without any changes to the application itself. Using this scheme, we are able to achieve a latency of 28.5 /spl mu/s for the Datagram sockets and 37 /spl mu/s for Data Streaming sockets compared to a latency of 120 /spl mu/s obtained by TCP for 4-byte messages. This scheme attains a peak bandwidth of around 840 Mbps. Both the latency and the throughput numbers are close to those achievable by EMP. The ftp application shows twice as much benefit on our sockets interface while the Web server application shows up to six times performance enhancement as compared to TCP. To the best of our knowledge, this is the first such design and implementation for Gigabit Ethernet.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86942248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Shell over a cluster (SHOC): towards achieving single system image via the shell Shell over a cluster (SHOC):通过Shell实现单一系统镜像
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137725
C. M. Tan, C. Tan, W. Wong
With dramatic improvements in cost-performance, the use of clusters of personal computers is becoming widespread. For ease of use and management, a single system image (SSI) is highly desirable. There are several approaches that one can take to achieve SSI. In this paper, we discuss the achievement of SSI via the use of the user login shell. To this end, we describe shoc (shell over a cluster)-an implementation of the standard Linux-GNU bash shell that permits the user to utilize a cluster as a single resource. In addition, shoc provides for transparent pre-emptive load balancing without requiring the user to rewrite, recompile or even relink of existing applications. Running at user-level, shoc does not require any kernel modification and currently runs on any Linux cluster fulfilling a minimal set of requirements. We also present results on the performance of shoc and show that the load balancing feature gives rise to better overall cluster utilization as well as improvement in response time for individual processes.
随着性价比的显著提高,个人电脑集群的使用正变得越来越普遍。为了便于使用和管理,非常需要单个系统映像(SSI)。有几种方法可以实现SSI。在本文中,我们讨论了通过使用用户登录shell来实现SSI。为此,我们描述shoc(集群上的shell)——一种标准Linux-GNU bash shell的实现,它允许用户将集群作为单个资源使用。此外,shoc提供透明的抢占式负载平衡,而不需要用户重写、重新编译甚至重新链接现有的应用程序。shoc在用户级运行,不需要对内核进行任何修改,目前可以在满足最低要求集的任何Linux集群上运行。我们还介绍了shoc性能的结果,并表明负载平衡特性提高了整体集群利用率,并改善了单个进程的响应时间。
{"title":"Shell over a cluster (SHOC): towards achieving single system image via the shell","authors":"C. M. Tan, C. Tan, W. Wong","doi":"10.1109/CLUSTR.2002.1137725","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137725","url":null,"abstract":"With dramatic improvements in cost-performance, the use of clusters of personal computers is becoming widespread. For ease of use and management, a single system image (SSI) is highly desirable. There are several approaches that one can take to achieve SSI. In this paper, we discuss the achievement of SSI via the use of the user login shell. To this end, we describe shoc (shell over a cluster)-an implementation of the standard Linux-GNU bash shell that permits the user to utilize a cluster as a single resource. In addition, shoc provides for transparent pre-emptive load balancing without requiring the user to rewrite, recompile or even relink of existing applications. Running at user-level, shoc does not require any kernel modification and currently runs on any Linux cluster fulfilling a minimal set of requirements. We also present results on the performance of shoc and show that the load balancing feature gives rise to better overall cluster utilization as well as improvement in response time for individual processes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81054286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Reliable Blast UDP : predictable high performance bulk data transfer 可靠的Blast UDP:可预测的高性能批量数据传输
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137760
E. He, J. Leigh, O. Yu, T. DeFanti
High speed bulk data transfer is an important part of many data-intensive scientific applications. This paper describes an aggressive bulk data transfer scheme, called Reliable Blast UDP (RBUDP), intended for extremely high bandwidth, dedicated- or Quality-of-Service-enabled networks, such as optically switched networks. This paper also provides an analytical model to predict RBUDP's performance and compares the results of our model against our implementation of RBUDP. Our results show that RBUDP performs extremely efficiently over high speed dedicated networks and our model is able to provide good estimates of its performance.
高速批量数据传输是许多数据密集型科学应用的重要组成部分。本文描述了一种积极的批量数据传输方案,称为可靠爆炸UDP (RBUDP),用于极高带宽,专用或服务质量支持的网络,如光交换网络。本文还提供了一个分析模型来预测RBUDP的性能,并将我们的模型结果与我们的RBUDP实现进行了比较。我们的结果表明,RBUDP在高速专用网络上的性能非常有效,我们的模型能够很好地估计其性能。
{"title":"Reliable Blast UDP : predictable high performance bulk data transfer","authors":"E. He, J. Leigh, O. Yu, T. DeFanti","doi":"10.1109/CLUSTR.2002.1137760","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137760","url":null,"abstract":"High speed bulk data transfer is an important part of many data-intensive scientific applications. This paper describes an aggressive bulk data transfer scheme, called Reliable Blast UDP (RBUDP), intended for extremely high bandwidth, dedicated- or Quality-of-Service-enabled networks, such as optically switched networks. This paper also provides an analytical model to predict RBUDP's performance and compares the results of our model against our implementation of RBUDP. Our results show that RBUDP performs extremely efficiently over high speed dedicated networks and our model is able to provide good estimates of its performance.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77434068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 262
First light of the Earth Simulator and its PC cluster applications 地球模拟器的第一盏灯和它的PC集群应用
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137744
K. Tani, T. Aoki, S. Matsuoka, Satoru Ohkura, Hitoshi Uehara, Tetsuo Aoyagi
The Earth Simulator (ES) is the largest parallel vector processor in the world that is mainly dedicated to large-scale simulation studies of global change. Development of the ES system started in 1997 and was completed at the end of February, 2002. The system consists of 640 processor nodes that are connected via a very fast single-stage crossbar network (12.3 GB/s). The total peak performance and main memory of the system are 40 TFLOPS and 10 TB, respectively. Studies to evaluate the performance of the ES were made using an atmospheric circulation model Afes (Atmospheric General Circulation Model for ES) and LINPACK benchmark test. The sustained performance of Afes for T1279L96 (the equivalent horizontal resolution given by T1279 is about 10 km and the total number of layers is 96) was as high as 14.5 TFLOPS on a half system of the ES with 2,560 PEs (320 nodes). The sustained-to-peak performance ratio was 70.8%. The ES also achieved a LINPACK world record of 35.86 TFLOPS. This rating exceeded the previous record, set by the ASCI White, by about 5 times. The Earth Simulator is now running. Huge amounts of output data will arise from the huge computer system. For example, the data volume of simulation results from the Afes is of the order of 10-100 TB. In the phase of operation, management of huge output datafiles and interactive visual monitoring of many terabytes of simulation results are extremely important for the ES. The ES has introduced a prototype PC cluster to seek the best solution to these problems. The PC cluster comprises 64 PCs that are interconnected with a Myrinet2000 switch. Each PC has a Pentium III (1 GHz), 1 GB of main memory and 120 GB of disk space. An outline of the Earth Simulator system, recent results on performance evaluation using real applications and the LINPACK benchmark test, and an outline of the PC cluster system are presented.
地球模拟器(ES)是目前世界上最大的并行矢量处理器,主要用于全球变化的大规模模拟研究。ES系统的开发工作于1997年展开,并于2002年2月底完成。该系统由640个处理器节点组成,这些节点通过非常快的单级交叉网络(12.3 GB/s)连接在一起。系统的总峰值性能为40tflops,主存为10tb。利用大气环流模型Afes (atmospheric General circulation model for ES)和LINPACK基准试验对ES的性能进行了评价研究。T1279L96 (T1279提供的等效水平分辨率约为10 km,总层数为96层)在具有2,560个pe(320个节点)的ES半系统上的Afes持续性能高达14.5 TFLOPS。持续峰值性能比为70.8%。ES还创造了35.86 TFLOPS的LINPACK世界纪录。这一评级超过了此前由ASCI White创下的纪录约5倍。地球模拟器现在正在运行。庞大的计算机系统将产生大量的输出数据。例如,Afes模拟结果的数据量为10- 100tb。在运行阶段,对巨大的输出数据文件的管理和对数tb模拟结果的交互式可视化监控对ES来说是极其重要的。为了解决这些问题,ES推出了PC集群的原型。PC集群由64台PC组成,通过Myrinet2000交换机互联。每台电脑都有一个Pentium III (1ghz), 1gb的主存和120gb的磁盘空间。介绍了地球模拟器系统的概况,利用实际应用和LINPACK基准测试进行性能评估的最新结果,以及PC集群系统的概况。
{"title":"First light of the Earth Simulator and its PC cluster applications","authors":"K. Tani, T. Aoki, S. Matsuoka, Satoru Ohkura, Hitoshi Uehara, Tetsuo Aoyagi","doi":"10.1109/CLUSTR.2002.1137744","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137744","url":null,"abstract":"The Earth Simulator (ES) is the largest parallel vector processor in the world that is mainly dedicated to large-scale simulation studies of global change. Development of the ES system started in 1997 and was completed at the end of February, 2002. The system consists of 640 processor nodes that are connected via a very fast single-stage crossbar network (12.3 GB/s). The total peak performance and main memory of the system are 40 TFLOPS and 10 TB, respectively. Studies to evaluate the performance of the ES were made using an atmospheric circulation model Afes (Atmospheric General Circulation Model for ES) and LINPACK benchmark test. The sustained performance of Afes for T1279L96 (the equivalent horizontal resolution given by T1279 is about 10 km and the total number of layers is 96) was as high as 14.5 TFLOPS on a half system of the ES with 2,560 PEs (320 nodes). The sustained-to-peak performance ratio was 70.8%. The ES also achieved a LINPACK world record of 35.86 TFLOPS. This rating exceeded the previous record, set by the ASCI White, by about 5 times. The Earth Simulator is now running. Huge amounts of output data will arise from the huge computer system. For example, the data volume of simulation results from the Afes is of the order of 10-100 TB. In the phase of operation, management of huge output datafiles and interactive visual monitoring of many terabytes of simulation results are extremely important for the ES. The ES has introduced a prototype PC cluster to seek the best solution to these problems. The PC cluster comprises 64 PCs that are interconnected with a Myrinet2000 switch. Each PC has a Pentium III (1 GHz), 1 GB of main memory and 120 GB of disk space. An outline of the Earth Simulator system, recent results on performance evaluation using real applications and the LINPACK benchmark test, and an outline of the PC cluster system are presented.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87547519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective buddy allocation for scheduling parallel jobs on clusters 为调度集群上的并行作业而选择伙伴分配
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137735
Vijay Subramani, R. Kettimuthu, Srividya Srinivasan, J. Johnston, P. Sadayappan
In this paper we evaluate the performance implications of using a buddy scheme for contiguous node allocation, in conjunction with a backfilling job scheduler for clusters. When a contiguous node allocation strategy is used, there is a trade-off between improved run-time of jobs (due to reduced link contention and lower communication overhead) and increased wait-time of jobs (due to external fragmentation of the processor system). Using trace-based simulation, a buddy strategy for contiguous node allocation is shown to be unattractive compared to the standard noncontiguous allocation strategy used in all production job schedulers. A simple but effective scheme for selective buddy allocation is then proposed, that is shown to perform better than non-contiguous allocation.
在本文中,我们评估了使用伙伴方案进行连续节点分配的性能影响,并结合了集群的回填作业调度器。当使用连续节点分配策略时,在作业运行时间的改进(由于链路争用减少和通信开销降低)和作业等待时间的增加(由于处理器系统的外部碎片)之间存在权衡。使用基于跟踪的模拟,与所有生产作业调度器中使用的标准非连续分配策略相比,用于连续节点分配的伙伴策略没有吸引力。在此基础上,提出了一种简单有效的选择性伙伴分配方案,其性能优于非连续分配。
{"title":"Selective buddy allocation for scheduling parallel jobs on clusters","authors":"Vijay Subramani, R. Kettimuthu, Srividya Srinivasan, J. Johnston, P. Sadayappan","doi":"10.1109/CLUSTR.2002.1137735","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137735","url":null,"abstract":"In this paper we evaluate the performance implications of using a buddy scheme for contiguous node allocation, in conjunction with a backfilling job scheduler for clusters. When a contiguous node allocation strategy is used, there is a trade-off between improved run-time of jobs (due to reduced link contention and lower communication overhead) and increased wait-time of jobs (due to external fragmentation of the processor system). Using trace-based simulation, a buddy strategy for contiguous node allocation is shown to be unattractive compared to the standard noncontiguous allocation strategy used in all production job schedulers. A simple but effective scheme for selective buddy allocation is then proposed, that is shown to perform better than non-contiguous allocation.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80492343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
期刊
Proceedings. IEEE International Conference on Cluster Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1