Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137759
E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Coll
Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making largescale clusters usable lies in the system software. In this paper we present STORM, a resource management tool designed to provide scalability, low overhead, and the flexibility necessary to efficiently support and analyze a wide range of job-scheduling algorithms. STORM achieves these feats by using a small set of primitive mechanisms that are common in modern high-performance interconnects. The architecture of STORM is based on three main technical innovations. First, a part of the scheduler runs in the thread processor located on the network interface. Second, we use hardware collectives that are highly scalable both for implementing control heartbeats and to distribute the binary of a parallel job in near-constant time. Third, we use an I/O bypass protocol that allows fast data movements front the file system to the communication buffers in the network interface and vice versa. The experimental results show that STORM can launch a job with a binary of 12 MB on a 64-processor, 32-node cluster in less than 250 ms. This paper provides expert. mental and analytical evidence that these results scale to a much larger number of nodes. To the best of our knowledge, STORM significantly outperforms existing production schedulers in launching jobs, performing resource management tasks, and gang-scheduling tasks.
{"title":"Scalable resource management in high performance computers","authors":"E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Coll","doi":"10.1109/CLUSTR.2002.1137759","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137759","url":null,"abstract":"Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making largescale clusters usable lies in the system software. In this paper we present STORM, a resource management tool designed to provide scalability, low overhead, and the flexibility necessary to efficiently support and analyze a wide range of job-scheduling algorithms. STORM achieves these feats by using a small set of primitive mechanisms that are common in modern high-performance interconnects. The architecture of STORM is based on three main technical innovations. First, a part of the scheduler runs in the thread processor located on the network interface. Second, we use hardware collectives that are highly scalable both for implementing control heartbeats and to distribute the binary of a parallel job in near-constant time. Third, we use an I/O bypass protocol that allows fast data movements front the file system to the communication buffers in the network interface and vice versa. The experimental results show that STORM can launch a job with a binary of 12 MB on a 64-processor, 32-node cluster in less than 250 ms. This paper provides expert. mental and analytical evidence that these results scale to a much larger number of nodes. To the best of our knowledge, STORM significantly outperforms existing production schedulers in launching jobs, performing resource management tasks, and gang-scheduling tasks.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137733
Bruno Richard
I-Cluster is an HP Laboratories Grenoble initiative in collaboration with the ID-IMAG laboratory of INRIA Rhone-Alpes. The aim of this research programme is to develop a framework of tools that transparently take advantage of unused network resources and federate them to crystallize into specific virtual functions such as supercomputing. To be more precise, I-Cluster enables automatic real-time analysis of the availability, configuration and workload of machines on an intranet. When instantiation of a supercomputing function is carried out by a user, I-Cluster determines the most appropriate set of machines for carrying out this function, allocates the machines into a virtual cluster and then starts execution of the function. To obtain this result, I-Cluster possesses a "sandbox" on each machine on the intranet, which is transparent to the user, and that enables use of the computing resources of these machines during their idle periods while securely protecting user data and jobs. This document presents the sandbox developed for I-Cluster and its features.
{"title":"I-Cluster: the execution sandbox","authors":"Bruno Richard","doi":"10.1109/CLUSTR.2002.1137733","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137733","url":null,"abstract":"I-Cluster is an HP Laboratories Grenoble initiative in collaboration with the ID-IMAG laboratory of INRIA Rhone-Alpes. The aim of this research programme is to develop a framework of tools that transparently take advantage of unused network resources and federate them to crystallize into specific virtual functions such as supercomputing. To be more precise, I-Cluster enables automatic real-time analysis of the availability, configuration and workload of machines on an intranet. When instantiation of a supercomputing function is carried out by a user, I-Cluster determines the most appropriate set of machines for carrying out this function, allocates the machines into a virtual cluster and then starts execution of the function. To obtain this result, I-Cluster possesses a \"sandbox\" on each machine on the intranet, which is transparent to the user, and that enables use of the computing resources of these machines during their idle periods while securely protecting user data and jobs. This document presents the sandbox developed for I-Cluster and its features.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90854194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137766
G. Almási, D. Beece, Ralph Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, Calin Cascaval, J. Castaños, L. Ceze, P. Coteus, S. Chatterjee, Dong Chen, G. Chiu, T. Cipolla, P. Crumley, A. Deutsch, M. B. Dombrowa, W. Donath, M. Eleftheriou, B. Fitch, J. Gagliano, A. Gara, R. Germain, M. Giampapa, Manish Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, T. Jamal-Eddine, G. Kopcsay, A. P. Lanzetta, D. Lieber, M. Lu, M. Mendell, L. Mok, J. Moreira, B. J. Nathanson, M. Newton, M. Ohmacht, R. Rand, R. Regan, R. Sahoo, A. Sanomiya, E. Schenfeld, Sarabjeet Singh, Peilin Song, B. Steinmacher-Burow, K. Strauss, R. Swetz, T. Takken, R. Tremaine, M. Tsao, P. Vranas, T. Ward, M. Wazlowski, J. Brown, T. Liebsch, A. Schram, G. Ulsh
Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.
只提供摘要形式。传统上,强大的大型网络和最先进的处理器一直主导着超级计算。随着技术的进步,这种方法可能会受到更具成本效益、具有更高系统集成水平的system - on - a - chip方法的挑战。应用程序对具有数万到数十万个处理器的体系结构的可伸缩性对于这种方法的成功至关重要。在将大量计算密集型应用程序映射到并行体系结构方面已经取得了重大进展,其中许多应用程序具有重大挑战。希望在未来任何体系结构的超级计算机上有效执行的应用程序必须以一种与高度并行性一致的方式进行编码。BG/L计划正在开发峰值标称180 TFLOPS(某些应用为360 TFLOPS)的超级计算机,以服务于广泛的科学应用。BG/L是QCDOC的总称,QCDOC是预计于2003年推出的第一台单片系统超级计算机。BG/L由65,536个节点组成,包含5个集成网络:3D环面、组合树、Gb以太网、屏障/全局中断网络和JTAG。
{"title":"Blue Gene/L, a system-on-a-chip","authors":"G. Almási, D. Beece, Ralph Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, A. Bright, J. Brunheroto, Calin Cascaval, J. Castaños, L. Ceze, P. Coteus, S. Chatterjee, Dong Chen, G. Chiu, T. Cipolla, P. Crumley, A. Deutsch, M. B. Dombrowa, W. Donath, M. Eleftheriou, B. Fitch, J. Gagliano, A. Gara, R. Germain, M. Giampapa, Manish Gupta, F. Gustavson, S. Hall, R. Haring, D. Heidel, P. Heidelberger, L. Herger, D. Hoenicke, T. Jamal-Eddine, G. Kopcsay, A. P. Lanzetta, D. Lieber, M. Lu, M. Mendell, L. Mok, J. Moreira, B. J. Nathanson, M. Newton, M. Ohmacht, R. Rand, R. Regan, R. Sahoo, A. Sanomiya, E. Schenfeld, Sarabjeet Singh, Peilin Song, B. Steinmacher-Burow, K. Strauss, R. Swetz, T. Takken, R. Tremaine, M. Tsao, P. Vranas, T. Ward, M. Wazlowski, J. Brown, T. Liebsch, A. Schram, G. Ulsh","doi":"10.1109/CLUSTR.2002.1137766","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137766","url":null,"abstract":"Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90314795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137742
Peter C. Bradley
Production Clustering refers to large-scale, high reliability clustering used as a tool for use by engineers or other users who are not necessarily experts in computer science. The combination of Production Clustering with cycle stealing on desktops presents unique technical challenges in ease of use, automation of tasks, and fault tolerance. Pratt and Whitney’s PROWESS system utilizes the idle capacity of thousands of Unix and Windows desktops for aerodynamic design analysis of aircraft propulsion systems. Pratt & Whitney’s experiences on Sun Solaris, Linux, and Windows will be used to highlight some of the challenges, successes, and potential of production clustering.
{"title":"Production clustering at Pratt & Whitney: desktop supercomputing for engineering design","authors":"Peter C. Bradley","doi":"10.1109/CLUSTR.2002.1137742","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137742","url":null,"abstract":"Production Clustering refers to large-scale, high reliability clustering used as a tool for use by engineers or other users who are not necessarily experts in computer science. The combination of Production Clustering with cycle stealing on desktops presents unique technical challenges in ease of use, automation of tasks, and fault tolerance. Pratt and Whitney’s PROWESS system utilizes the idle capacity of thousands of Unix and Windows desktops for aerodynamic design analysis of aircraft propulsion systems. Pratt & Whitney’s experiences on Sun Solaris, Linux, and Windows will be used to highlight some of the challenges, successes, and potential of production clustering.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76743327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137778
M. Beynon, H. Andrade, J. Saltz
We show how five low-cost and nonintrusive techniques that work using free commodity tools such as GDB can be used to improve the debugging process of multi-threaded and/or distributed parallel programs. These techniques have been used in the development of major software middleware - DataCutter and MQO - and have proven their value by lowering the time necessary to detect and correct bugs.
{"title":"Low-cost non-intrusive debugging strategies for distributed parallel programs","authors":"M. Beynon, H. Andrade, J. Saltz","doi":"10.1109/CLUSTR.2002.1137778","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137778","url":null,"abstract":"We show how five low-cost and nonintrusive techniques that work using free commodity tools such as GDB can be used to improve the debugging process of multi-threaded and/or distributed parallel programs. These techniques have been used in the development of major software middleware - DataCutter and MQO - and have proven their value by lowering the time necessary to detect and correct bugs.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137758
V. Leung, E. Arkin, M. A. Bender, David P. Bunde, J. Johnston, Alok Lal, Joseph B. M. Mitchell, C. Phillips, S. Seiden
The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.
{"title":"Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies","authors":"V. Leung, E. Arkin, M. A. Bender, David P. Bunde, J. Johnston, Alok Lal, Joseph B. M. Mitchell, C. Phillips, S. Seiden","doi":"10.1109/CLUSTR.2002.1137758","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137758","url":null,"abstract":"The Computational Plant or Cplant is a commodity-based supercomputer under development at Sandia National Laboratories. This paper describes resource-allocation strategies to achieve processor locality for parallel jobs in Cplant and other supercomputers. Users of Cplant and other Sandia supercomputers submit parallel jobs to a job queue. When a job is scheduled to run, it is assigned to a set of processors. To obtain maximum throughput, jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This paper introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in the new release of the Cplant System Software, Version 2.0, phased into the Cplant systems at Sandia by May 2002.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84254111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137789
V. Lo, Jens Mache
Current job scheduling systems for massively parallel machines and Beowulf-class compute clusters support batch scheduling involving two classes of queues: prime time vs. non-prime time. Jobs running in these queue classes must satisfy different criteria with respect to job-size, runtime, or other resource needs. These constraints are designed to delay big jobs to non-prime time in order to provide better quality service during the prime time workday hours. This paper surveys existing prime time/non-prime time scheduling policies and investigates the sensitivity of scheduling performance to changes in the jobsize and runtime limits allowed during prime time vs. non-prime time. Our simulation study, using real workload traces from the NASA NAS IBM SP/2 cluster gives strong evidence for the use of specific prime time limits and sheds light oil the performance trade-offs regarding response times, utilization, short term scheduling algorithm (FCFS vs. EASY backfilling), and success and overflow rates.
当前用于大规模并行机器和beowulf级计算集群的作业调度系统支持涉及两类队列的批调度:黄金时间和非黄金时间。在这些队列类中运行的作业必须满足作业大小、运行时或其他资源需求方面的不同标准。这些限制是为了将大型工作推迟到非黄金时间,以便在工作日的黄金时间提供更优质的服务。本文调查了现有的黄金时间/非黄金时间调度策略,并研究了调度性能对黄金时间和非黄金时间允许的作业大小和运行时限制变化的敏感性。我们的模拟研究使用了来自NASA NAS IBM SP/2集群的真实工作负载跟踪,为使用特定的主要时间限制提供了强有力的证据,并揭示了有关响应时间、利用率、短期调度算法(FCFS vs. EASY回填)以及成功率和溢出率的性能权衡。
{"title":"Job scheduling for prime time vs. non-prime time","authors":"V. Lo, Jens Mache","doi":"10.1109/CLUSTR.2002.1137789","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137789","url":null,"abstract":"Current job scheduling systems for massively parallel machines and Beowulf-class compute clusters support batch scheduling involving two classes of queues: prime time vs. non-prime time. Jobs running in these queue classes must satisfy different criteria with respect to job-size, runtime, or other resource needs. These constraints are designed to delay big jobs to non-prime time in order to provide better quality service during the prime time workday hours. This paper surveys existing prime time/non-prime time scheduling policies and investigates the sensitivity of scheduling performance to changes in the jobsize and runtime limits allowed during prime time vs. non-prime time. Our simulation study, using real workload traces from the NASA NAS IBM SP/2 cluster gives strong evidence for the use of specific prime time limits and sheds light oil the performance trade-offs regarding response times, utilization, short term scheduling algorithm (FCFS vs. EASY backfilling), and success and overflow rates.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82252050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137784
M. McMahon, B. DeLong, M. Fotta, G. J. Weinstock, A. E. Williams
This paper describes the design of MAGE-Metacomputing and Grid Environment-an environment for developing and executing parallel programs on COTS cluster computers and grids. The intention of MAGE is to provide a layer of abstraction at the level of parallel program compilation, execution, and monitoring. The user is isolated from the details of these operations, while preserving a robust, flexible set of capabilities for advanced parallel program development. While most metacomputing abstractions focus on access to extant parallel resources, or on integrating dispersed resources into a grid, MAGE integrates cluster middleware with workstation applications, and extends this paradigm by focusing on providing a development environment for the creation of new parallel programs. The flexible, modular design of the MAGE components ensures portability to different clustering platforms and promotes eventual integration of a MAGE-based system with a grid system.
{"title":"MAGE-a metacomputing environment for parallel program development on cluster computers","authors":"M. McMahon, B. DeLong, M. Fotta, G. J. Weinstock, A. E. Williams","doi":"10.1109/CLUSTR.2002.1137784","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137784","url":null,"abstract":"This paper describes the design of MAGE-Metacomputing and Grid Environment-an environment for developing and executing parallel programs on COTS cluster computers and grids. The intention of MAGE is to provide a layer of abstraction at the level of parallel program compilation, execution, and monitoring. The user is isolated from the details of these operations, while preserving a robust, flexible set of capabilities for advanced parallel program development. While most metacomputing abstractions focus on access to extant parallel resources, or on integrating dispersed resources into a grid, MAGE integrates cluster middleware with workstation applications, and extends this paradigm by focusing on providing a development environment for the creation of new parallel programs. The flexible, modular design of the MAGE components ensures portability to different clustering platforms and promotes eventual integration of a MAGE-based system with a grid system.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79125734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137743
J. Dongarra
In this talk we will look at how High Performance computing has changed over the last 10-year and look toward the future in terms of trends with a focus on cluster computing. We will also looks at an approach for deploying numerical libraries on clusters, called LAPACK for Clusters (LFC). The LFC software intends to allow users to dynamically link against an archived library of executable routines. The user is assumed to call one of the LFC routines from a single processor on the cluster. The intent is to possibly leverage the parallel computing power of the cluster to solve the problem on the user’s behalf. The software accounts for the details required for parallelizing the user’s problem such as resource discovery and selection, and mapping the data onto and off of the process grid in addition to executing the parallel library routine itself.
{"title":"Trends in high performance computing and using numerical libraries on clusters","authors":"J. Dongarra","doi":"10.1109/CLUSTR.2002.1137743","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137743","url":null,"abstract":"In this talk we will look at how High Performance computing has changed over the last 10-year and look toward the future in terms of trends with a focus on cluster computing. We will also looks at an approach for deploying numerical libraries on clusters, called LAPACK for Clusters (LFC). The LFC software intends to allow users to dynamically link against an archived library of executable routines. The user is assumed to call one of the LFC routines from a single processor on the cluster. The intent is to possibly leverage the parallel computing power of the cluster to solve the problem on the user’s behalf. The software accounts for the details required for parallelizing the user’s problem such as resource discovery and selection, and mapping the data onto and off of the process grid in addition to executing the parallel library routine itself.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137772
T. Fahringer, A. Jugravu, B. D. Martino, S. Venticinque, H. Moritsch
In the past few years, increasing interest has been shown in using Java as a language for performance-oriented distributed and parallel computing. Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In contrast to most existing systems, JavaSymphony - a class library written entirely in Java - allows to control parallelism, load balancing and locality at a high level. Objects can be explicitly distributed and migrated based on virtual architectures which impose a virtual hierarchy on a distributed/parallel system of physical computing nodes. The concept of blocking/nonblocking remote method invocation is used to exchange data among distributed objects and to process work by remote objects. We evaluate the JavaSymphony programming API for a variety of distributed/parallel algorithms which comprises backtracking, N-body, encryption/decryption algorithms and asynchronous nested optimization algorithms. Performance results are presented for both homogeneous and heterogeneous cluster architectures. Moreover, we compare JavaSymphony with an alternative well-known semi-automatic system.
{"title":"On the evaluation of JavaSymphony for cluster applications","authors":"T. Fahringer, A. Jugravu, B. D. Martino, S. Venticinque, H. Moritsch","doi":"10.1109/CLUSTR.2002.1137772","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137772","url":null,"abstract":"In the past few years, increasing interest has been shown in using Java as a language for performance-oriented distributed and parallel computing. Most Java-based systems that support portable parallel and distributed computing either require the programmer to deal with intricate low level details of Java which can be a tedious, time-consuming and error-prone task, or prevent the programmer from controlling locality of data. In contrast to most existing systems, JavaSymphony - a class library written entirely in Java - allows to control parallelism, load balancing and locality at a high level. Objects can be explicitly distributed and migrated based on virtual architectures which impose a virtual hierarchy on a distributed/parallel system of physical computing nodes. The concept of blocking/nonblocking remote method invocation is used to exchange data among distributed objects and to process work by remote objects. We evaluate the JavaSymphony programming API for a variety of distributed/parallel algorithms which comprises backtracking, N-body, encryption/decryption algorithms and asynchronous nested optimization algorithms. Performance results are presented for both homogeneous and heterogeneous cluster architectures. Moreover, we compare JavaSymphony with an alternative well-known semi-automatic system.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89953505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}