首页 > 最新文献

Proceedings. IEEE International Conference on Cluster Computing最新文献

英文 中文
Scalable resource management in high performance computers 高性能计算机中的可伸缩资源管理
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137759
E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Coll
Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making largescale clusters usable lies in the system software. In this paper we present STORM, a resource management tool designed to provide scalability, low overhead, and the flexibility necessary to efficiently support and analyze a wide range of job-scheduling algorithms. STORM achieves these feats by using a small set of primitive mechanisms that are common in modern high-performance interconnects. The architecture of STORM is based on three main technical innovations. First, a part of the scheduler runs in the thread processor located on the network interface. Second, we use hardware collectives that are highly scalable both for implementing control heartbeats and to distribute the binary of a parallel job in near-constant time. Third, we use an I/O bypass protocol that allows fast data movements front the file system to the communication buffers in the network interface and vice versa. The experimental results show that STORM can launch a job with a binary of 12 MB on a 64-processor, 32-node cluster in less than 250 ms. This paper provides expert. mental and analytical evidence that these results scale to a much larger number of nodes. To the best of our knowledge, STORM significantly outperforms existing production schedulers in launching jobs, performing resource management tasks, and gang-scheduling tasks.
工作站集群已经成为构建具有成本效益、可伸缩和高可用性计算机的重要平台。尽管现在有许多硬件解决方案可用,但是使大规模集群可用的最大挑战在于系统软件。在本文中,我们介绍了STORM,一种资源管理工具,旨在提供可扩展性,低开销和灵活性,以有效地支持和分析各种作业调度算法。STORM通过使用现代高性能互连中常见的一小部分原始机制来实现这些壮举。STORM的架构基于三个主要的技术创新。首先,调度器的一部分在位于网络接口上的线程处理器中运行。其次,我们使用高度可扩展的硬件集合来实现控制心跳,并在接近恒定的时间内分发并行作业的二进制数据。第三,我们使用I/O旁路协议,该协议允许快速数据从文件系统前端移动到网络接口中的通信缓冲区,反之亦然。实验结果表明,STORM可以在不到250毫秒的时间内在64处理器、32节点的集群上启动一个12 MB二进制文件的作业。本文提供了专家意见。心理和分析证据表明,这些结果可以扩展到更大数量的节点。据我们所知,STORM在启动作业、执行资源管理任务和组调度任务方面明显优于现有的生产调度程序。
{"title":"Scalable resource management in high performance computers","authors":"E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Coll","doi":"10.1109/CLUSTR.2002.1137759","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137759","url":null,"abstract":"Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making largescale clusters usable lies in the system software. In this paper we present STORM, a resource management tool designed to provide scalability, low overhead, and the flexibility necessary to efficiently support and analyze a wide range of job-scheduling algorithms. STORM achieves these feats by using a small set of primitive mechanisms that are common in modern high-performance interconnects. The architecture of STORM is based on three main technical innovations. First, a part of the scheduler runs in the thread processor located on the network interface. Second, we use hardware collectives that are highly scalable both for implementing control heartbeats and to distribute the binary of a parallel job in near-constant time. Third, we use an I/O bypass protocol that allows fast data movements front the file system to the communication buffers in the network interface and vice versa. The experimental results show that STORM can launch a job with a binary of 12 MB on a 64-processor, 32-node cluster in less than 250 ms. This paper provides expert. mental and analytical evidence that these results scale to a much larger number of nodes. To the best of our knowledge, STORM significantly outperforms existing production schedulers in launching jobs, performing resource management tasks, and gang-scheduling tasks.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
I-Cluster: the execution sandbox I-Cluster:执行沙盒
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137733
Bruno Richard
I-Cluster is an HP Laboratories Grenoble initiative in collaboration with the ID-IMAG laboratory of INRIA Rhone-Alpes. The aim of this research programme is to develop a framework of tools that transparently take advantage of unused network resources and federate them to crystallize into specific virtual functions such as supercomputing. To be more precise, I-Cluster enables automatic real-time analysis of the availability, configuration and workload of machines on an intranet. When instantiation of a supercomputing function is carried out by a user, I-Cluster determines the most appropriate set of machines for carrying out this function, allocates the machines into a virtual cluster and then starts execution of the function. To obtain this result, I-Cluster possesses a "sandbox" on each machine on the intranet, which is transparent to the user, and that enables use of the computing resources of these machines during their idle periods while securely protecting user data and jobs. This document presents the sandbox developed for I-Cluster and its features.
I-Cluster是惠普格勒诺布尔实验室与法国罗纳-阿尔卑斯省INRIA的id - image实验室合作的项目。这项研究计划的目的是开发一个工具框架,透明地利用未使用的网络资源,并将它们联合起来,形成特定的虚拟功能,如超级计算。更准确地说,I-Cluster可以自动实时分析内网机器的可用性、配置和工作负载。当用户执行超级计算功能的实例化时,I-Cluster确定执行该功能的最合适的机器集,将这些机器分配到虚拟集群中,然后开始执行该功能。为了获得这个结果,I-Cluster在内部网的每台机器上都有一个“沙箱”,这对用户是透明的,这样就可以在空闲期间使用这些机器的计算资源,同时安全地保护用户数据和作业。本文介绍了为I-Cluster开发的沙盒及其特性。
{"title":"I-Cluster: the execution sandbox","authors":"Bruno Richard","doi":"10.1109/CLUSTR.2002.1137733","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137733","url":null,"abstract":"I-Cluster is an HP Laboratories Grenoble initiative in collaboration with the ID-IMAG laboratory of INRIA Rhone-Alpes. The aim of this research programme is to develop a framework of tools that transparently take advantage of unused network resources and federate them to crystallize into specific virtual functions such as supercomputing. To be more precise, I-Cluster enables automatic real-time analysis of the availability, configuration and workload of machines on an intranet. When instantiation of a supercomputing function is carried out by a user, I-Cluster determines the most appropriate set of machines for carrying out this function, allocates the machines into a virtual cluster and then starts execution of the function. To obtain this result, I-Cluster possesses a \"sandbox\" on each machine on the intranet, which is transparent to the user, and that enables use of the computing resources of these machines during their idle periods while securely protecting user data and jobs. This document presents the sandbox developed for I-Cluster and its features.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90854194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blue Gene/L, a system-on-a-chip 蓝色基因/L,芯片上的系统
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137766
G. Almási, D. Beece, Ralph Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, Calin Cascaval, J. Castaños, L. Ceze, P. Coteus, S. Chatterjee, Dong Chen, G. Chiu, T. Cipolla, P. Crumley, A. Deutsch, M. B. Dombrowa, W. Donath, M. Eleftheriou, B. Fitch, J. Gagliano, A. Gara, R. Germain, M. Giampapa, Manish Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, T. Jamal-Eddine, G. Kopcsay, A. P. Lanzetta, D. Lieber, M. Lu, M. Mendell, L. Mok, J. Moreira, B. J. Nathanson, M. Newton, M. Ohmacht, R. Rand, R. Regan, R. Sahoo, A. Sanomiya, E. Schenfeld, Sarabjeet Singh, Peilin Song, B. Steinmacher-Burow, K. Strauss, R. Swetz, T. Takken, R. Tremaine, M. Tsao, P. Vranas, T. Ward, M. Wazlowski, J. Brown, T. Liebsch, A. Schram, G. Ulsh
Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.
只提供摘要形式。传统上,强大的大型网络和最先进的处理器一直主导着超级计算。随着技术的进步,这种方法可能会受到更具成本效益、具有更高系统集成水平的system - on - a - chip方法的挑战。应用程序对具有数万到数十万个处理器的体系结构的可伸缩性对于这种方法的成功至关重要。在将大量计算密集型应用程序映射到并行体系结构方面已经取得了重大进展,其中许多应用程序具有重大挑战。希望在未来任何体系结构的超级计算机上有效执行的应用程序必须以一种与高度并行性一致的方式进行编码。BG/L计划正在开发峰值标称180 TFLOPS(某些应用为360 TFLOPS)的超级计算机,以服务于广泛的科学应用。BG/L是QCDOC的总称,QCDOC是预计于2003年推出的第一台单片系统超级计算机。BG/L由65,536个节点组成,包含5个集成网络:3D环面、组合树、Gb以太网、屏障/全局中断网络和JTAG。
{"title":"Blue Gene/L, a system-on-a-chip","authors":"G. Almási, D. Beece, Ralph Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, Calin Cascaval, J. Castaños, L. Ceze, P. Coteus, S. Chatterjee, Dong Chen, G. Chiu, T. Cipolla, P. Crumley, A. Deutsch, M. B. Dombrowa, W. Donath, M. Eleftheriou, B. Fitch, J. Gagliano, A. Gara, R. Germain, M. Giampapa, Manish Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, T. Jamal-Eddine, G. Kopcsay, A. P. Lanzetta, D. Lieber, M. Lu, M. Mendell, L. Mok, J. Moreira, B. J. Nathanson, M. Newton, M. Ohmacht, R. Rand, R. Regan, R. Sahoo, A. Sanomiya, E. Schenfeld, Sarabjeet Singh, Peilin Song, B. Steinmacher-Burow, K. Strauss, R. Swetz, T. Takken, R. Tremaine, M. Tsao, P. Vranas, T. Ward, M. Wazlowski, J. Brown, T. Liebsch, A. Schram, G. Ulsh","doi":"10.1109/CLUSTR.2002.1137766","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137766","url":null,"abstract":"Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90314795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Production clustering at Pratt & Whitney: desktop supercomputing for engineering design 普惠公司的生产集群:用于工程设计的桌面超级计算
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137742
Peter C. Bradley
Production Clustering refers to large-scale, high reliability clustering used as a tool for use by engineers or other users who are not necessarily experts in computer science. The combination of Production Clustering with cycle stealing on desktops presents unique technical challenges in ease of use, automation of tasks, and fault tolerance. Pratt and Whitney’s PROWESS system utilizes the idle capacity of thousands of Unix and Windows desktops for aerodynamic design analysis of aircraft propulsion systems. Pratt & Whitney’s experiences on Sun Solaris, Linux, and Windows will be used to highlight some of the challenges, successes, and potential of production clustering.
生产集群指的是大规模的、高可靠性的集群,它被用作工程师或其他不一定是计算机科学专家的用户使用的工具。生产集群与桌面上的周期窃取的结合在易用性、任务自动化和容错性方面提出了独特的技术挑战。普惠公司的“威力”系统利用数千台Unix和Windows台式机的闲置容量,对飞机推进系统进行气动设计分析。普拉特&惠特尼在Sun Solaris、Linux和Windows上的经验将被用来强调生产集群的一些挑战、成功和潜力。
{"title":"Production clustering at Pratt & Whitney: desktop supercomputing for engineering design","authors":"Peter C. Bradley","doi":"10.1109/CLUSTR.2002.1137742","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137742","url":null,"abstract":"Production Clustering refers to large-scale, high reliability clustering used as a tool for use by engineers or other users who are not necessarily experts in computer science. The combination of Production Clustering with cycle stealing on desktops presents unique technical challenges in ease of use, automation of tasks, and fault tolerance. Pratt and Whitney’s PROWESS system utilizes the idle capacity of thousands of Unix and Windows desktops for aerodynamic design analysis of aircraft propulsion systems. Pratt & Whitney’s experiences on Sun Solaris, Linux, and Windows will be used to highlight some of the challenges, successes, and potential of production clustering.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76743327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-cost non-intrusive debugging strategies for distributed parallel programs 分布式并行程序的低成本非侵入式调试策略
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137778
M. Beynon, H. Andrade, J. Saltz
We show how five low-cost and nonintrusive techniques that work using free commodity tools such as GDB can be used to improve the debugging process of multi-threaded and/or distributed parallel programs. These techniques have been used in the development of major software middleware - DataCutter and MQO - and have proven their value by lowering the time necessary to detect and correct bugs.
我们展示了五种低成本和非侵入性的技术如何使用免费的商品工具(如GDB)来改进多线程和/或分布式并行程序的调试过程。这些技术已经用于主要软件中间件(DataCutter和MQO)的开发,并且通过降低检测和纠正错误所需的时间证明了它们的价值。
{"title":"Low-cost non-intrusive debugging strategies for distributed parallel programs","authors":"M. Beynon, H. Andrade, J. Saltz","doi":"10.1109/CLUSTR.2002.1137778","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137778","url":null,"abstract":"We show how five low-cost and nonintrusive techniques that work using free commodity tools such as GDB can be used to improve the debugging process of multi-threaded and/or distributed parallel programs. These techniques have been used in the development of major software middleware - DataCutter and MQO - and have proven their value by lowering the time necessary to detect and correct bugs.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies plant上的处理器分配:使用一维分配策略实现一般处理器局部性
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137758
V. Leung, E. Arkin, M. A. Bender, David P. Bunde, J. Johnston, Alok Lal, Joseph B. M. Mitchell, C. Phillips, S. Seiden
The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.
计算工厂是桑迪亚国家实验室正在开发的一种基于商品的超级计算机。本文描述了在plant和其他超级计算机中实现并行作业处理器局部性的资源分配策略。plant和其他Sandia超级计算机的用户将并行作业提交到作业队列。当一个作业被安排运行时,它被分配给一组处理器。为了获得最大的吞吐量,应该将作业分配给本地化的处理器集群,以最小化通信成本,并避免重叠作业导致的带宽争用。本文介绍了基于空间填充曲线和一维分配策略的新的分配策略和性能指标。这些算法是通用和简单的。初步的模拟和植物实验表明,与之前在植物上使用的排序自由列表策略相比,空间填充曲线和一维填充策略都提高了处理器的局域性。这些新的分配策略在新发布的移植系统软件2.0版中实现,该软件将于2002年5月分阶段进入桑迪亚的移植系统。
{"title":"Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies","authors":"V. Leung, E. Arkin, M. A. Bender, David P. Bunde, J. Johnston, Alok Lal, Joseph B. M. Mitchell, C. Phillips, S. Seiden","doi":"10.1109/CLUSTR.2002.1137758","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137758","url":null,"abstract":"The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84254111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Job scheduling for prime time vs. non-prime time 黄金时间与非黄金时间的作业调度
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137789
V. Lo, Jens Mache
Current job scheduling systems for massively parallel machines and Beowulf-class compute clusters support batch scheduling involving two classes of queues: prime time vs. non-prime time. Jobs running in these queue classes must satisfy different criteria with respect to job-size, runtime, or other resource needs. These constraints are designed to delay big jobs to non-prime time in order to provide better quality service during the prime time workday hours. This paper surveys existing prime time/non-prime time scheduling policies and investigates the sensitivity of scheduling performance to changes in the jobsize and runtime limits allowed during prime time vs. non-prime time. Our simulation study, using real workload traces from the NASA NAS IBM SP/2 cluster gives strong evidence for the use of specific prime time limits and sheds light oil the performance trade-offs regarding response times, utilization, short term scheduling algorithm (FCFS vs. EASY backfilling), and success and overflow rates.
当前用于大规模并行机器和beowulf级计算集群的作业调度系统支持涉及两类队列的批调度:黄金时间和非黄金时间。在这些队列类中运行的作业必须满足作业大小、运行时或其他资源需求方面的不同标准。这些限制是为了将大型工作推迟到非黄金时间,以便在工作日的黄金时间提供更优质的服务。本文调查了现有的黄金时间/非黄金时间调度策略,并研究了调度性能对黄金时间和非黄金时间允许的作业大小和运行时限制变化的敏感性。我们的模拟研究使用了来自NASA NAS IBM SP/2集群的真实工作负载跟踪,为使用特定的主要时间限制提供了强有力的证据,并揭示了有关响应时间、利用率、短期调度算法(FCFS vs. EASY回填)以及成功率和溢出率的性能权衡。
{"title":"Job scheduling for prime time vs. non-prime time","authors":"V. Lo, Jens Mache","doi":"10.1109/CLUSTR.2002.1137789","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137789","url":null,"abstract":"Current job scheduling systems for massively parallel machines and Beowulf-class compute clusters support batch scheduling involving two classes of queues: prime time vs. non-prime time. Jobs running in these queue classes must satisfy different criteria with respect to job-size, runtime, or other resource needs. These constraints are designed to delay big jobs to non-prime time in order to provide better quality service during the prime time workday hours. This paper surveys existing prime time/non-prime time scheduling policies and investigates the sensitivity of scheduling performance to changes in the jobsize and runtime limits allowed during prime time vs. non-prime time. Our simulation study, using real workload traces from the NASA NAS IBM SP/2 cluster gives strong evidence for the use of specific prime time limits and sheds light oil the performance trade-offs regarding response times, utilization, short term scheduling algorithm (FCFS vs. EASY backfilling), and success and overflow rates.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82252050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
MAGE-a metacomputing environment for parallel program development on cluster computers 一种在集群计算机上进行并行程序开发的元计算环境
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137784
M. McMahon, B. DeLong, M. Fotta, G. J. Weinstock, A. E. Williams
This paper describes the design of MAGE-Metacomputing and Grid Environment-an environment for developing and executing parallel programs on COTS cluster computers and grids. The intention of MAGE is to provide a layer of abstraction at the level of parallel program compilation, execution, and monitoring. The user is isolated from the details of these operations, while preserving a robust, flexible set of capabilities for advanced parallel program development. While most metacomputing abstractions focus on access to extant parallel resources, or on integrating dispersed resources into a grid, MAGE integrates cluster middleware with workstation applications, and extends this paradigm by focusing on providing a development environment for the creation of new parallel programs. The flexible, modular design of the MAGE components ensures portability to different clustering platforms and promotes eventual integration of a MAGE-based system with a grid system.
本文描述了在COTS集群计算机和网格上开发和执行并行程序的环境mage -元计算和网格环境的设计。MAGE的目的是在并行程序编译、执行和监视级别提供一个抽象层。用户与这些操作的细节隔离,同时为高级并行程序开发保留了一组健壮、灵活的功能。虽然大多数元计算抽象关注的是对现有并行资源的访问,或将分散的资源集成到网格中,但MAGE将集群中间件与工作站应用程序集成在一起,并通过专注于为创建新的并行程序提供开发环境来扩展这种范式。MAGE组件的灵活模块化设计确保了不同集群平台的可移植性,并促进了基于MAGE的系统与网格系统的最终集成。
{"title":"MAGE-a metacomputing environment for parallel program development on cluster computers","authors":"M. McMahon, B. DeLong, M. Fotta, G. J. Weinstock, A. E. Williams","doi":"10.1109/CLUSTR.2002.1137784","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137784","url":null,"abstract":"This paper describes the design of MAGE-Metacomputing and Grid Environment-an environment for developing and executing parallel programs on COTS cluster computers and grids. The intention of MAGE is to provide a layer of abstraction at the level of parallel program compilation, execution, and monitoring. The user is isolated from the details of these operations, while preserving a robust, flexible set of capabilities for advanced parallel program development. While most metacomputing abstractions focus on access to extant parallel resources, or on integrating dispersed resources into a grid, MAGE integrates cluster middleware with workstation applications, and extends this paradigm by focusing on providing a development environment for the creation of new parallel programs. The flexible, modular design of the MAGE components ensures portability to different clustering platforms and promotes eventual integration of a MAGE-based system with a grid system.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79125734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trends in high performance computing and using numerical libraries on clusters 高性能计算和在集群上使用数字库的趋势
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137743
J. Dongarra
In this talk we will look at how High Performance computing has changed over the last 10-year and look toward the future in terms of trends with a focus on cluster computing. We will also looks at an approach for deploying numerical libraries on clusters, called LAPACK for Clusters (LFC). The LFC software intends to allow users to dynamically link against an archived library of executable routines. The user is assumed to call one of the LFC routines from a single processor on the cluster. The intent is to possibly leverage the parallel computing power of the cluster to solve the problem on the user’s behalf. The software accounts for the details required for parallelizing the user’s problem such as resource discovery and selection, and mapping the data onto and off of the process grid in addition to executing the parallel library routine itself.
在本次演讲中,我们将介绍高性能计算在过去十年中的变化,并以集群计算为重点,展望高性能计算的未来趋势。我们还将研究一种在集群上部署数字库的方法,称为集群的LAPACK (LFC)。LFC软件旨在允许用户动态链接可执行例程的存档库。假设用户从集群上的单个处理器调用LFC例程之一。其目的是尽可能利用集群的并行计算能力来代表用户解决问题。除了执行并行库例程本身外,软件还考虑了并行化用户问题所需的细节,例如资源发现和选择,以及将数据映射到进程网格上和网格外。
{"title":"Trends in high performance computing and using numerical libraries on clusters","authors":"J. Dongarra","doi":"10.1109/CLUSTR.2002.1137743","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137743","url":null,"abstract":"In this talk we will look at how High Performance computing has changed over the last 10-year and look toward the future in terms of trends with a focus on cluster computing. We will also looks at an approach for deploying numerical libraries on clusters, called LAPACK for Clusters (LFC). The LFC software intends to allow users to dynamically link against an archived library of executable routines. The user is assumed to call one of the LFC routines from a single processor on the cluster. The intent is to possibly leverage the parallel computing power of the cluster to solve the problem on the user’s behalf. The software accounts for the details required for parallelizing the user’s problem such as resource discovery and selection, and mapping the data onto and off of the process grid in addition to executing the parallel library routine itself.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the evaluation of JavaSymphony for cluster applications 关于集群应用中JavaSymphony的评价
Pub Date : 2002-09-23 DOI: 10.1109/CLUSTR.2002.1137772
T. Fahringer, A. Jugravu, B. D. Martino, S. Venticinque, H. Moritsch
In the past few years, increasing interest has been shown in using Java as a language for performance-oriented distributed and parallel computing. Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In contrast to most existing systems, JavaSymphony - a class library written entirely in Java - allows to control parallelism, load balancing and locality at a high level. Objects can be explicitly distributed and migrated based on virtual architectures which impose a virtual hierarchy on a distributed/parallel system of physical computing nodes. The concept of blocking/nonblocking remote method invocation is used to exchange data among distributed objects and to process work by remote objects. We evaluate the JavaSymphony programming API for a variety of distributed/parallel algorithms which comprises backtracking, N-body, encryption/decryption algorithms and asynchronous nested optimization algorithms. Performance results are presented for both homogeneous and heterogeneous cluster architectures. Moreover, we compare JavaSymphony with an alternative well-known semi-automatic system.
在过去的几年中,越来越多的人对使用Java作为面向性能的分布式和并行计算语言感兴趣。大多数支持可移植并行和分布式计算的基于Java的系统,要么要求程序员处理复杂的Java底层细节,这可能是一项乏味、耗时且容易出错的任务,要么阻止程序员控制数据的位置。与大多数现有系统相比,JavaSymphony——一个完全用Java编写的类库——允许在高层次上控制并行性、负载平衡和局部性。对象可以基于虚拟架构进行显式分布和迁移,虚拟架构在物理计算节点的分布式/并行系统上施加了虚拟层次结构。阻塞/非阻塞远程方法调用的概念用于在分布式对象之间交换数据和处理远程对象的工作。我们评估了各种分布式/并行算法的JavaSymphony编程API,包括回溯,N-body,加密/解密算法和异步嵌套优化算法。给出了同构和异构集群体系结构的性能结果。此外,我们还将JavaSymphony与另一个知名的半自动系统进行了比较。
{"title":"On the evaluation of JavaSymphony for cluster applications","authors":"T. Fahringer, A. Jugravu, B. D. Martino, S. Venticinque, H. Moritsch","doi":"10.1109/CLUSTR.2002.1137772","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137772","url":null,"abstract":"In the past few years, increasing interest has been shown in using Java as a language for performance-oriented distributed and parallel computing. Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In contrast to most existing systems, JavaSymphony - a class library written entirely in Java - allows to control parallelism, load balancing and locality at a high level. Objects can be explicitly distributed and migrated based on virtual architectures which impose a virtual hierarchy on a distributed/parallel system of physical computing nodes. The concept of blocking/nonblocking remote method invocation is used to exchange data among distributed objects and to process work by remote objects. We evaluate the JavaSymphony programming API for a variety of distributed/parallel algorithms which comprises backtracking, N-body, encryption/decryption algorithms and asynchronous nested optimization algorithms. Performance results are presented for both homogeneous and heterogeneous cluster architectures. Moreover, we compare JavaSymphony with an alternative well-known semi-automatic system.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89953505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings. IEEE International Conference on Cluster Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1