Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137781
Byoung-Dai Lee, J. Weissman
Advances in packaging and interface technologies have made it possible for software components to be shared across the network through encapsulation and offered as network services. They allow end-users to focus on their applications and obtain remote services when needed simply by invoking them across the network. Many groups have built significant infrastructures for providing domain-specific high performance services. However, transforming high performance applications into network services is labor intensive and time consuming because there is little existing infrastructure to utilize. In this paper, we propose a software toolkit and runtime infrastructure for rapid deployment of network services.
{"title":"Community services: a toolkit for rapid deployment of network services","authors":"Byoung-Dai Lee, J. Weissman","doi":"10.1109/CLUSTR.2002.1137781","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137781","url":null,"abstract":"Advances in packaging and interface technologies have made it possible for software components to be shared across the network through encapsulation and offered as network services. They allow end-users to focus on their applications and obtain remote services when needed simply by invoking them across the network. Many groups have built significant infrastructures for providing domain-specific high performance services. However, transforming high performance applications into network services is labor intensive and time consuming because there is little existing infrastructure to utilize. In this paper, we propose a software toolkit and runtime infrastructure for rapid deployment of network services.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89124558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137722
Kirk M. Bresniker
Summary form only given. During the recent rise and subsequent fall of the Internet bubble, a new computer system design model emerged, primarily from venture capital start-ups. Bladed systems, dense arrays of single board computers housed in a common chassis, seemed a promising way for service providers to keep pace with the anticipated dot.com inspired build-out. The blades were dense, low bandwidth and low in computational power, but they were suited to rapid deployment of masses of content delivery. Along with other lessons learned as the 'irrational exhuberance' faded and unviable business models and their edge applications were winnowed from data centers, the designers of bladed systems began to realize that blades had the potential to move from 'edge-only' applications into high performance enterprise, communication, and technical compute. All leading manufacturers now have high performance blade designs either in design or shipping now. Key to the high performance blade are shifts in the processor, storage, networking and management technologies from those utilized in first generation blades. These shifts could enable bladed systems to delivery multi-system compute arrays at appreciably lower total cost of ownership.
{"title":"Blades - an emerging system design model for economic delivery of high performance computing","authors":"Kirk M. Bresniker","doi":"10.1109/CLUSTR.2002.1137722","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137722","url":null,"abstract":"Summary form only given. During the recent rise and subsequent fall of the Internet bubble, a new computer system design model emerged, primarily from venture capital start-ups. Bladed systems, dense arrays of single board computers housed in a common chassis, seemed a promising way for service providers to keep pace with the anticipated dot.com inspired build-out. The blades were dense, low bandwidth and low in computational power, but they were suited to rapid deployment of masses of content delivery. Along with other lessons learned as the 'irrational exhuberance' faded and unviable business models and their edge applications were winnowed from data centers, the designers of bladed systems began to realize that blades had the potential to move from 'edge-only' applications into high performance enterprise, communication, and technical compute. All leading manufacturers now have high performance blade designs either in design or shipping now. Key to the high performance blade are shifts in the processor, storage, networking and management technologies from those utilized in first generation blades. These shifts could enable bladed systems to delivery multi-system compute arrays at appreciably lower total cost of ownership.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83329419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137764
Brett M. Bode
Over the past few years cluster computers have become commonplace. During that time the interconnect choices have gotten more numerous. For the early clusters the obvious choice was Fast Ethernet. However, today there are several options including Gigabit Ethernet, Myrinet, SCI and others. Which one is right for your cluster will depend on many factors including cost, cluster size, latency and bandwidth needs. We have examined the currently available interconnects and will present performance results based on both raw bandwidth measurements and application scalability. Finally we will examine the pros and cons of each interconnect and make recommendations as to what type of cluster/application for which each interconnect is best suited.
{"title":"Interconnects: which one is right for you?","authors":"Brett M. Bode","doi":"10.1109/CLUSTR.2002.1137764","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137764","url":null,"abstract":"Over the past few years cluster computers have become commonplace. During that time the interconnect choices have gotten more numerous. For the early clusters the obvious choice was Fast Ethernet. However, today there are several options including Gigabit Ethernet, Myrinet, SCI and others. Which one is right for your cluster will depend on many factors including cost, cluster size, latency and bandwidth needs. We have examined the currently available interconnects and will present performance results based on both raw bandwidth measurements and application scalability. Finally we will examine the pros and cons of each interconnect and make recommendations as to what type of cluster/application for which each interconnect is best suited.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CLUSTR.2002.1137764","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72440102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137754
W. Gropp, E. Lusk
PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of network-connected systems.
{"title":"Goals guiding design: PVM and MPI","authors":"W. Gropp, E. Lusk","doi":"10.1109/CLUSTR.2002.1137754","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137754","url":null,"abstract":"PVM and MPI, two systems for programming clusters, are often compared. The comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of network-connected systems.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81218691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137757
J. Laros, L. Ward, Nathan W. Dauchy, R. Brightwell, Trammell Hudson, Ruth Klundt
This paper describes an object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability. This architecture has been successfully implemented and deployed on several large-scale production clusters at Sandia National Laboratories, the largest of which is currently 1861 nodes. This paper discusses the key features of the architecture that allow for easily extending the range of supported hardware devices and network topologies. We also describe in detail how the object-oriented structure that represents the hardware components can be used to implement scalable and portable cluster management tools.
{"title":"An extensible, portable, scalable cluster management software architecture","authors":"J. Laros, L. Ward, Nathan W. Dauchy, R. Brightwell, Trammell Hudson, Ruth Klundt","doi":"10.1109/CLUSTR.2002.1137757","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137757","url":null,"abstract":"This paper describes an object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability. This architecture has been successfully implemented and deployed on several large-scale production clusters at Sandia National Laboratories, the largest of which is currently 1861 nodes. This paper discusses the key features of the architecture that allow for easily extending the range of supported hardware devices and network topologies. We also describe in detail how the object-oriented structure that represents the hardware components can be used to implement scalable and portable cluster management tools.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81127332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137745
P. Balaji, Piyush Shivam, P. Wyckoff, D. Panda
While a number of user-level protocols have been developed to reduce the gap between the performance capabilities of the physical network and the performance actually available, applications that have already been developed on kernel based protocols such as TCP have largely been ignored. There is a need to make these existing TCP applications take advantage of the modern user-level protocols such as EMP or VIA which feature both low-latency and high bandwidth. We have designed, implemented and evaluated a scheme to support such applications written using the sockets API to run over EMP without any changes to the application itself. Using this scheme, we are able to achieve a latency of 28.5 /spl mu/s for the Datagram sockets and 37 /spl mu/s for Data Streaming sockets compared to a latency of 120 /spl mu/s obtained by TCP for 4-byte messages. This scheme attains a peak bandwidth of around 840 Mbps. Both the latency and the throughput numbers are close to those achievable by EMP. The ftp application shows twice as much benefit on our sockets interface while the Web server application shows up to six times performance enhancement as compared to TCP. To the best of our knowledge, this is the first such design and implementation for Gigabit Ethernet.
{"title":"High performance user level sockets over Gigabit Ethernet","authors":"P. Balaji, Piyush Shivam, P. Wyckoff, D. Panda","doi":"10.1109/CLUSTR.2002.1137745","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137745","url":null,"abstract":"While a number of user-level protocols have been developed to reduce the gap between the performance capabilities of the physical network and the performance actually available, applications that have already been developed on kernel based protocols such as TCP have largely been ignored. There is a need to make these existing TCP applications take advantage of the modern user-level protocols such as EMP or VIA which feature both low-latency and high bandwidth. We have designed, implemented and evaluated a scheme to support such applications written using the sockets API to run over EMP without any changes to the application itself. Using this scheme, we are able to achieve a latency of 28.5 /spl mu/s for the Datagram sockets and 37 /spl mu/s for Data Streaming sockets compared to a latency of 120 /spl mu/s obtained by TCP for 4-byte messages. This scheme attains a peak bandwidth of around 840 Mbps. Both the latency and the throughput numbers are close to those achievable by EMP. The ftp application shows twice as much benefit on our sockets interface while the Web server application shows up to six times performance enhancement as compared to TCP. To the best of our knowledge, this is the first such design and implementation for Gigabit Ethernet.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86942248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137725
C. M. Tan, C. Tan, W. Wong
With dramatic improvements in cost-performance, the use of clusters of personal computers is becoming widespread. For ease of use and management, a single system image (SSI) is highly desirable. There are several approaches that one can take to achieve SSI. In this paper, we discuss the achievement of SSI via the use of the user login shell. To this end, we describe shoc (shell over a cluster)-an implementation of the standard Linux-GNU bash shell that permits the user to utilize a cluster as a single resource. In addition, shoc provides for transparent pre-emptive load balancing without requiring the user to rewrite, recompile or even relink of existing applications. Running at user-level, shoc does not require any kernel modification and currently runs on any Linux cluster fulfilling a minimal set of requirements. We also present results on the performance of shoc and show that the load balancing feature gives rise to better overall cluster utilization as well as improvement in response time for individual processes.
{"title":"Shell over a cluster (SHOC): towards achieving single system image via the shell","authors":"C. M. Tan, C. Tan, W. Wong","doi":"10.1109/CLUSTR.2002.1137725","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137725","url":null,"abstract":"With dramatic improvements in cost-performance, the use of clusters of personal computers is becoming widespread. For ease of use and management, a single system image (SSI) is highly desirable. There are several approaches that one can take to achieve SSI. In this paper, we discuss the achievement of SSI via the use of the user login shell. To this end, we describe shoc (shell over a cluster)-an implementation of the standard Linux-GNU bash shell that permits the user to utilize a cluster as a single resource. In addition, shoc provides for transparent pre-emptive load balancing without requiring the user to rewrite, recompile or even relink of existing applications. Running at user-level, shoc does not require any kernel modification and currently runs on any Linux cluster fulfilling a minimal set of requirements. We also present results on the performance of shoc and show that the load balancing feature gives rise to better overall cluster utilization as well as improvement in response time for individual processes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81054286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137760
E. He, J. Leigh, O. Yu, T. DeFanti
High speed bulk data transfer is an important part of many data-intensive scientific applications. This paper describes an aggressive bulk data transfer scheme, called Reliable Blast UDP (RBUDP), intended for extremely high bandwidth, dedicated- or Quality-of-Service-enabled networks, such as optically switched networks. This paper also provides an analytical model to predict RBUDP's performance and compares the results of our model against our implementation of RBUDP. Our results show that RBUDP performs extremely efficiently over high speed dedicated networks and our model is able to provide good estimates of its performance.
{"title":"Reliable Blast UDP : predictable high performance bulk data transfer","authors":"E. He, J. Leigh, O. Yu, T. DeFanti","doi":"10.1109/CLUSTR.2002.1137760","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137760","url":null,"abstract":"High speed bulk data transfer is an important part of many data-intensive scientific applications. This paper describes an aggressive bulk data transfer scheme, called Reliable Blast UDP (RBUDP), intended for extremely high bandwidth, dedicated- or Quality-of-Service-enabled networks, such as optically switched networks. This paper also provides an analytical model to predict RBUDP's performance and compares the results of our model against our implementation of RBUDP. Our results show that RBUDP performs extremely efficiently over high speed dedicated networks and our model is able to provide good estimates of its performance.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77434068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137744
K. Tani, T. Aoki, S. Matsuoka, Satoru Ohkura, Hitoshi Uehara, Tetsuo Aoyagi
The Earth Simulator (ES) is the largest parallel vector processor in the world that is mainly dedicated to large-scale simulation studies of global change. Development of the ES system started in 1997 and was completed at the end of February, 2002. The system consists of 640 processor nodes that are connected via a very fast single-stage crossbar network (12.3 GB/s). The total peak performance and main memory of the system are 40 TFLOPS and 10 TB, respectively. Studies to evaluate the performance of the ES were made using an atmospheric circulation model Afes (Atmospheric General Circulation Model for ES) and LINPACK benchmark test. The sustained performance of Afes for T1279L96 (the equivalent horizontal resolution given by T1279 is about 10 km and the total number of layers is 96) was as high as 14.5 TFLOPS on a half system of the ES with 2,560 PEs (320 nodes). The sustained-to-peak performance ratio was 70.8%. The ES also achieved a LINPACK world record of 35.86 TFLOPS. This rating exceeded the previous record, set by the ASCI White, by about 5 times. The Earth Simulator is now running. Huge amounts of output data will arise from the huge computer system. For example, the data volume of simulation results from the Afes is of the order of 10-100 TB. In the phase of operation, management of huge output datafiles and interactive visual monitoring of many terabytes of simulation results are extremely important for the ES. The ES has introduced a prototype PC cluster to seek the best solution to these problems. The PC cluster comprises 64 PCs that are interconnected with a Myrinet2000 switch. Each PC has a Pentium III (1 GHz), 1 GB of main memory and 120 GB of disk space. An outline of the Earth Simulator system, recent results on performance evaluation using real applications and the LINPACK benchmark test, and an outline of the PC cluster system are presented.
地球模拟器(ES)是目前世界上最大的并行矢量处理器,主要用于全球变化的大规模模拟研究。ES系统的开发工作于1997年展开,并于2002年2月底完成。该系统由640个处理器节点组成,这些节点通过非常快的单级交叉网络(12.3 GB/s)连接在一起。系统的总峰值性能为40tflops,主存为10tb。利用大气环流模型Afes (atmospheric General circulation model for ES)和LINPACK基准试验对ES的性能进行了评价研究。T1279L96 (T1279提供的等效水平分辨率约为10 km,总层数为96层)在具有2,560个pe(320个节点)的ES半系统上的Afes持续性能高达14.5 TFLOPS。持续峰值性能比为70.8%。ES还创造了35.86 TFLOPS的LINPACK世界纪录。这一评级超过了此前由ASCI White创下的纪录约5倍。地球模拟器现在正在运行。庞大的计算机系统将产生大量的输出数据。例如,Afes模拟结果的数据量为10- 100tb。在运行阶段,对巨大的输出数据文件的管理和对数tb模拟结果的交互式可视化监控对ES来说是极其重要的。为了解决这些问题,ES推出了PC集群的原型。PC集群由64台PC组成,通过Myrinet2000交换机互联。每台电脑都有一个Pentium III (1ghz), 1gb的主存和120gb的磁盘空间。介绍了地球模拟器系统的概况,利用实际应用和LINPACK基准测试进行性能评估的最新结果,以及PC集群系统的概况。
{"title":"First light of the Earth Simulator and its PC cluster applications","authors":"K. Tani, T. Aoki, S. Matsuoka, Satoru Ohkura, Hitoshi Uehara, Tetsuo Aoyagi","doi":"10.1109/CLUSTR.2002.1137744","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137744","url":null,"abstract":"The Earth Simulator (ES) is the largest parallel vector processor in the world that is mainly dedicated to large-scale simulation studies of global change. Development of the ES system started in 1997 and was completed at the end of February, 2002. The system consists of 640 processor nodes that are connected via a very fast single-stage crossbar network (12.3 GB/s). The total peak performance and main memory of the system are 40 TFLOPS and 10 TB, respectively. Studies to evaluate the performance of the ES were made using an atmospheric circulation model Afes (Atmospheric General Circulation Model for ES) and LINPACK benchmark test. The sustained performance of Afes for T1279L96 (the equivalent horizontal resolution given by T1279 is about 10 km and the total number of layers is 96) was as high as 14.5 TFLOPS on a half system of the ES with 2,560 PEs (320 nodes). The sustained-to-peak performance ratio was 70.8%. The ES also achieved a LINPACK world record of 35.86 TFLOPS. This rating exceeded the previous record, set by the ASCI White, by about 5 times. The Earth Simulator is now running. Huge amounts of output data will arise from the huge computer system. For example, the data volume of simulation results from the Afes is of the order of 10-100 TB. In the phase of operation, management of huge output datafiles and interactive visual monitoring of many terabytes of simulation results are extremely important for the ES. The ES has introduced a prototype PC cluster to seek the best solution to these problems. The PC cluster comprises 64 PCs that are interconnected with a Myrinet2000 switch. Each PC has a Pentium III (1 GHz), 1 GB of main memory and 120 GB of disk space. An outline of the Earth Simulator system, recent results on performance evaluation using real applications and the LINPACK benchmark test, and an outline of the PC cluster system are presented.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87547519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137735
Vijay Subramani, R. Kettimuthu, Srividya Srinivasan, J. Johnston, P. Sadayappan
In this paper we evaluate the performance implications of using a buddy scheme for contiguous node allocation, in conjunction with a backfilling job scheduler for clusters. When a contiguous node allocation strategy is used, there is a trade-off between improved run-time of jobs (due to reduced link contention and lower communication overhead) and increased wait-time of jobs (due to external fragmentation of the processor system). Using trace-based simulation, a buddy strategy for contiguous node allocation is shown to be unattractive compared to the standard noncontiguous allocation strategy used in all production job schedulers. A simple but effective scheme for selective buddy allocation is then proposed, that is shown to perform better than non-contiguous allocation.
{"title":"Selective buddy allocation for scheduling parallel jobs on clusters","authors":"Vijay Subramani, R. Kettimuthu, Srividya Srinivasan, J. Johnston, P. Sadayappan","doi":"10.1109/CLUSTR.2002.1137735","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137735","url":null,"abstract":"In this paper we evaluate the performance implications of using a buddy scheme for contiguous node allocation, in conjunction with a backfilling job scheduler for clusters. When a contiguous node allocation strategy is used, there is a trade-off between improved run-time of jobs (due to reduced link contention and lower communication overhead) and increased wait-time of jobs (due to external fragmentation of the processor system). Using trace-based simulation, a buddy strategy for contiguous node allocation is shown to be unattractive compared to the standard noncontiguous allocation strategy used in all production job schedulers. A simple but effective scheme for selective buddy allocation is then proposed, that is shown to perform better than non-contiguous allocation.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80492343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}