Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137734
Win Bausch, C. Pautasso, R. Schaeppi, G. Alonso
In this paper we present BioOpera, an extensible process support system for cluster-aware computing. It features an intuitive way to specify computations, as well as improved support for running them over a cluster providing monitoring, persistence, fault tolerance and interaction capabilities without sacrificing efficiency and scalability.
{"title":"BioOpera: cluster-aware computing","authors":"Win Bausch, C. Pautasso, R. Schaeppi, G. Alonso","doi":"10.1109/CLUSTR.2002.1137734","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137734","url":null,"abstract":"In this paper we present BioOpera, an extensible process support system for cluster-aware computing. It features an intuitive way to specify computations, as well as improved support for running them over a cluster providing monitoring, persistence, fault tolerance and interaction capabilities without sacrificing efficiency and scalability.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77635797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137763
M. Henzinger
Since January 2002, the Google search engine has been powering an average of 150 million web searches a day, with a peark of over 2000 searches per second. These searches are performed over an index of over 2 billion documents, over 300 million images, and over 700 million Usenet messages. To guarantee fast user response time, Google performs these searches on a cluster of over 10,000 PCs. The main challenages with this architecture are fault-tolerance and the quality of search results. Replication solves the former and the PageRank score is used to advance the latter. The PageRank score is based on an eigenvalue computation of a large matrix that is derived from the web graph and is one of the main contributor to very high quality search results. As Internet use continues to grow, so does the use of the Google search engine. The Google architecture is designed to scale to accommodate the growth in useage as well as the growth of the web.
{"title":"Indexing the web - a challenge for supercomputers","authors":"M. Henzinger","doi":"10.1109/CLUSTR.2002.1137763","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137763","url":null,"abstract":"Since January 2002, the Google search engine has been powering an average of 150 million web searches a day, with a peark of over 2000 searches per second. These searches are performed over an index of over 2 billion documents, over 300 million images, and over 700 million Usenet messages. To guarantee fast user response time, Google performs these searches on a cluster of over 10,000 PCs. The main challenages with this architecture are fault-tolerance and the quality of search results. Replication solves the former and the PageRank score is used to advance the latter. The PageRank score is based on an eigenvalue computation of a large matrix that is derived from the web graph and is one of the main contributor to very high quality search results. As Internet use continues to grow, so does the use of the Google search engine. The Google architecture is designed to scale to accommodate the growth in useage as well as the growth of the web.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74746748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137729
R. Evard, N. Desai, J. Navarro, Daniel Nurmi
In this paper, we describe the use of a cluster as a generalized facility for development. A development facility is a system used primarily for testing and development activities while being operated reliably for many users. We are in the midst of a project to build and operate a large-scale development facility. We discuss our motivation for using clusters in this way and compare the model with a classic computing facility. We describe our experiences and findings from the first phase of this project. Many of these observations are relevant to the design of standard clusters and to future development facilities.
{"title":"Clusters as large-scale development facilities","authors":"R. Evard, N. Desai, J. Navarro, Daniel Nurmi","doi":"10.1109/CLUSTR.2002.1137729","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137729","url":null,"abstract":"In this paper, we describe the use of a cluster as a generalized facility for development. A development facility is a system used primarily for testing and development activities while being operated reliably for many users. We are in the midst of a project to build and operate a large-scale development facility. We discuss our motivation for using clusters in this way and compare the model with a classic computing facility. We describe our experiences and findings from the first phase of this project. Many of these observations are relevant to the design of standard clusters and to future development facilities.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91214538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137739
Soo-Cheol Oh, Sang-Hwa Chung, Hankook Jang
It is extremely important to minimize network access time when constructing a high-performance PC cluster system. For an SCI-based PC cluster it is possible to reduce the network access time by maintaining network cache in each cluster node. This paper presents the second version CC-NUMA card (CC-NUMA card II) that utilizes network cache for SCI-based PC clustering. The CC-NUMA card II is directly plugged into the PCI slot of each node, and contains shared memory, network cache, a shared memory control module and network control module. The network cache is maintained for shared memory on the PCI bus of cluster nodes. The coherency mechanism between network cache and shared memory is based on the IEEE SCI standard. In previous research, the first version CC-NUMA card (CC-NUMA card I) was developed. The CC-NUMA card I adopting Dolphin's PCI-SCI card as the network control module caused overhead in exchanging data between the remote nodes. In this paper, the overhead is removed by developing the CC-NUMA card II that combines the shared memory control module and network control module in a single card. Throughout the experiment with the SPLASH-2 benchmark suite, the CC-NUMA card II based PC cluster shows better performance than a NUMA system based on Dolphin's PCI-SCI card.
{"title":"Design and implementation of CC-NUMA card II for SCI-based PC clustering","authors":"Soo-Cheol Oh, Sang-Hwa Chung, Hankook Jang","doi":"10.1109/CLUSTR.2002.1137739","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137739","url":null,"abstract":"It is extremely important to minimize network access time when constructing a high-performance PC cluster system. For an SCI-based PC cluster it is possible to reduce the network access time by maintaining network cache in each cluster node. This paper presents the second version CC-NUMA card (CC-NUMA card II) that utilizes network cache for SCI-based PC clustering. The CC-NUMA card II is directly plugged into the PCI slot of each node, and contains shared memory, network cache, a shared memory control module and network control module. The network cache is maintained for shared memory on the PCI bus of cluster nodes. The coherency mechanism between network cache and shared memory is based on the IEEE SCI standard. In previous research, the first version CC-NUMA card (CC-NUMA card I) was developed. The CC-NUMA card I adopting Dolphin's PCI-SCI card as the network control module caused overhead in exchanging data between the remote nodes. In this paper, the overhead is removed by developing the CC-NUMA card II that combines the shared memory control module and network control module in a single card. Throughout the experiment with the SPLASH-2 benchmark suite, the CC-NUMA card II based PC cluster shows better performance than a NUMA system based on Dolphin's PCI-SCI card.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84551825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137788
F. Turck, S. Vanhastel, P. Thysebaert, B. Volckaert, P. Demeester, B. Dhoedt
In this paper, we address the design and implementation of a generic and scalable platform for efficient management of computational resources. The developed platform is called the Intelligent Agent Platform. Its architecture is based on middleware technology in order to ensure easy distribution of the software components between the participating workstations and to exploit advanced software techniques. The computational tasks are referred to as agents, defined as software components that are capable of executing particular algorithms on input data. The platform offers advanced features such as transparent task management, load balancing, run time compilation of agent code and task migration and is therefore denoted by the adjective "Intelligent". The architecture of the platform will be outlined from a computational point of view and each component will be described in detail. Furthermore, some important design issues of the platform are covered and a performance evaluation is presented.
{"title":"Design of a middleware-based cluster management platform with task management and migration","authors":"F. Turck, S. Vanhastel, P. Thysebaert, B. Volckaert, P. Demeester, B. Dhoedt","doi":"10.1109/CLUSTR.2002.1137788","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137788","url":null,"abstract":"In this paper, we address the design and implementation of a generic and scalable platform for efficient management of computational resources. The developed platform is called the Intelligent Agent Platform. Its architecture is based on middleware technology in order to ensure easy distribution of the software components between the participating workstations and to exploit advanced software techniques. The computational tasks are referred to as agents, defined as software components that are capable of executing particular algorithms on input data. The platform offers advanced features such as transparent task management, load balancing, run time compilation of agent code and task migration and is therefore denoted by the adjective \"Intelligent\". The architecture of the platform will be outlined from a computational point of view and each component will be described in detail. Furthermore, some important design issues of the platform are covered and a performance evaluation is presented.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83348494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137740
Jiesheng Wu, Jiuxing Liu, P. Wyckoff, D. Panda
Designing scalable and efficient Message Passing Interface (MPI) implementations for emerging cluster interconnects such as VIA-based networks and InfiniBand is important for building next generation clusters. In this paper, we address the scalability issue in implementation of MPI over VIA by an on-demand connection management mechanism. On-demand connection management is designed to limit the use of resources by applications that absolutely require them. We address design issues of incorporating the on-demand connection mechanism into an implementation of MPI over VIA. A complete implementation was done for MVICH over both cLAN VIA and Berkeley VIA. Performance evaluation on a set of microbenchmarks and NAS parallel benchmarks demonstrates that the on-demand mechanism can increase the scalability of MPI implementations by limiting the use of resources as needed by applications. It also shows that the on-demand mechanism delivers comparable or better performance as the static mechanism in which a fully-connected process model usually exists in the MPI implementations. These results demonstrate that the on-demand connection mechanism is a feasible solution to increase the scalability of MPI implementations over VIA- and InfiniBand-based networks.
{"title":"Impact of on-demand connection management in MPI over VIA","authors":"Jiesheng Wu, Jiuxing Liu, P. Wyckoff, D. Panda","doi":"10.1109/CLUSTR.2002.1137740","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137740","url":null,"abstract":"Designing scalable and efficient Message Passing Interface (MPI) implementations for emerging cluster interconnects such as VIA-based networks and InfiniBand is important for building next generation clusters. In this paper, we address the scalability issue in implementation of MPI over VIA by an on-demand connection management mechanism. On-demand connection management is designed to limit the use of resources by applications that absolutely require them. We address design issues of incorporating the on-demand connection mechanism into an implementation of MPI over VIA. A complete implementation was done for MVICH over both cLAN VIA and Berkeley VIA. Performance evaluation on a set of microbenchmarks and NAS parallel benchmarks demonstrates that the on-demand mechanism can increase the scalability of MPI implementations by limiting the use of resources as needed by applications. It also shows that the on-demand mechanism delivers comparable or better performance as the static mechanism in which a fully-connected process model usually exists in the MPI implementations. These results demonstrate that the on-demand connection mechanism is a feasible solution to increase the scalability of MPI implementations over VIA- and InfiniBand-based networks.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83743200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137768
G. Goumas, Nikolaos Drosinos, Maria Athanasaki, N. Koziris
We present a complete end-to-end framework to generate automatic message-passing code for tiled iteration spaces. We consider general parallelepiped tiling transformations and general convex iteration spaces. We aim to address all problems concerning data parallel code generation efficiently by transforming the initial non-rectangular tile to a rectangular one. In this way, data distribution and communication become simple and straightforward. We have implemented our parallelizing techniques in a tool which automatically generates MPI code and run several experiments on a cluster of PCs. Our experimental results show the merit of general parallelepiped tiling transformations, and confirm previous theoretical work on scheduling-optimal tile shapes.
{"title":"Compiling tiled iteration spaces for clusters","authors":"G. Goumas, Nikolaos Drosinos, Maria Athanasaki, N. Koziris","doi":"10.1109/CLUSTR.2002.1137768","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137768","url":null,"abstract":"We present a complete end-to-end framework to generate automatic message-passing code for tiled iteration spaces. We consider general parallelepiped tiling transformations and general convex iteration spaces. We aim to address all problems concerning data parallel code generation efficiently by transforming the initial non-rectangular tile to a rectangular one. In this way, data distribution and communication become simple and straightforward. We have implemented our parallelizing techniques in a tool which automatically generates MPI code and run several experiments on a cluster of PCs. Our experimental results show the merit of general parallelepiped tiling transformations, and confirm previous theoretical work on scheduling-optimal tile shapes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72978622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137728
M. Katz, P. Papadopoulos, Greg Bruno
Clusters have made the jump from lab prototypes to full-fledged production computing platforms. The number variety, and specialized configurations of these machines are increasing dramatically with 32-128 node clusters being commonplace in science labs. The evolving nature of the platform is to target generic PC hardware to specialized functions such as login, compute, Web server file server and a visualization engine. This is the logical extension to the standard login/compute dichotomy of traditional Beowulf clusters. Clearly, these specialized nodes (henceforth "cluster appliances") share an immense amount of common configuration and software. What is lacking in many clustering toolkits is the ability to share configuration across appliances and specific hardware (where it should be shared) and differentiate only where needed In the NPACI Rocks cluster distribution, we have developed a configuration infrastructure with well-defined inheritance properties that leverages and builds on de facto standards including: XML (with standard parsers), RedHat Kickstart, HTTP transport, CGI, SQL databases, and graph constructs to easily define cluster appliances. Our approach neither resorts to replication of configuration files nor does it require building a "golden" image reference. By relying on this descriptive and programmatic infrastructure and carefully demarking configuration information from the software packages (which is a bit delivery mechanism), we can easily handle the heterogeneity of appliances, easily deal with small hardware differences among particular instances of appliances (such as IDE vs. SCSI), and support large hardware differences (like /spl times/86 vs. IA64) with the same infrastructure. Our mechanism is easily extended to other descriptive infrastructures (such as Solaris Jumpstart as a backend target) and has been proven on over a 100 clusters (with significant hardware and configuration differences among these clusters).
{"title":"Leveraging standard core technologies to programmatically build Linux cluster appliances","authors":"M. Katz, P. Papadopoulos, Greg Bruno","doi":"10.1109/CLUSTR.2002.1137728","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137728","url":null,"abstract":"Clusters have made the jump from lab prototypes to full-fledged production computing platforms. The number variety, and specialized configurations of these machines are increasing dramatically with 32-128 node clusters being commonplace in science labs. The evolving nature of the platform is to target generic PC hardware to specialized functions such as login, compute, Web server file server and a visualization engine. This is the logical extension to the standard login/compute dichotomy of traditional Beowulf clusters. Clearly, these specialized nodes (henceforth \"cluster appliances\") share an immense amount of common configuration and software. What is lacking in many clustering toolkits is the ability to share configuration across appliances and specific hardware (where it should be shared) and differentiate only where needed In the NPACI Rocks cluster distribution, we have developed a configuration infrastructure with well-defined inheritance properties that leverages and builds on de facto standards including: XML (with standard parsers), RedHat Kickstart, HTTP transport, CGI, SQL databases, and graph constructs to easily define cluster appliances. Our approach neither resorts to replication of configuration files nor does it require building a \"golden\" image reference. By relying on this descriptive and programmatic infrastructure and carefully demarking configuration information from the software packages (which is a bit delivery mechanism), we can easily handle the heterogeneity of appliances, easily deal with small hardware differences among particular instances of appliances (such as IDE vs. SCSI), and support large hardware differences (like /spl times/86 vs. IA64) with the same infrastructure. Our mechanism is easily extended to other descriptive infrastructures (such as Solaris Jumpstart as a backend target) and has been proven on over a 100 clusters (with significant hardware and configuration differences among these clusters).","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84400697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137730
A. Maccabe, Wenbin Zhu, J. Otto, R. Riesen
Offloading protocol processing will become an important tool in supporting our efforts to deliver increasing bandwidth to applications. In this paper we describe our experience in offloading protocol processing to a programmable gigabit Ethernet network interface card. For our experiments, we selected a simple RTS/CTS (request to send/clear to send) protocol called RMPP (reliable message passing protocol). This protocol provides end-to-end flow control and full message retransmit in the case of a lost or corrupt packet. By carefully selecting parts of the protocol for offloading, we were able to improve the bandwidth delivered to MPI applications from approximately 280 Mb/s to approximately 700 Mb/s using standard, 1500 byte, Ethernet frames. Using "jumbo", 9000 byte frames the bandwidth improves from approximately 425 Mb/s to 840 Mb/s. Moreover, we were able to show a significant increase in the availability of the host processor.
{"title":"Experience in offloading protocol processing to a programmable NIC","authors":"A. Maccabe, Wenbin Zhu, J. Otto, R. Riesen","doi":"10.1109/CLUSTR.2002.1137730","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137730","url":null,"abstract":"Offloading protocol processing will become an important tool in supporting our efforts to deliver increasing bandwidth to applications. In this paper we describe our experience in offloading protocol processing to a programmable gigabit Ethernet network interface card. For our experiments, we selected a simple RTS/CTS (request to send/clear to send) protocol called RMPP (reliable message passing protocol). This protocol provides end-to-end flow control and full message retransmit in the case of a lost or corrupt packet. By carefully selecting parts of the protocol for offloading, we were able to improve the bandwidth delivered to MPI applications from approximately 280 Mb/s to approximately 700 Mb/s using standard, 1500 byte, Ethernet frames. Using \"jumbo\", 9000 byte frames the bandwidth improves from approximately 425 Mb/s to 840 Mb/s. Moreover, we were able to show a significant increase in the availability of the host processor.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81502172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-23DOI: 10.1109/CLUSTR.2002.1137767
Anthony T. Chronopoulos, Satish Penmatsa, Ning Yu
Distributed systems (e.g. a LAN of computers) can be used for concurrent processing for some applications. However a serious difficulty in concurrent programming of a distributed system is how to deal with scheduling and load balancing of such a system which may consist of heterogeneous computers. Distributed scheduling schemes suitable for parallel loops with independent iterations on heterogeneous computer clusters have been proposed and analyzed in the past. Here, we implement the previous schemes in CORBA (Orbix). We also present an extension of these schemes implemented in a hierarchical master-slave architecture. We present experimental results and comparisons.
{"title":"Scalable loop self-scheduling schemes for heterogeneous clusters","authors":"Anthony T. Chronopoulos, Satish Penmatsa, Ning Yu","doi":"10.1109/CLUSTR.2002.1137767","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137767","url":null,"abstract":"Distributed systems (e.g. a LAN of computers) can be used for concurrent processing for some applications. However a serious difficulty in concurrent programming of a distributed system is how to deal with scheduling and load balancing of such a system which may consist of heterogeneous computers. Distributed scheduling schemes suitable for parallel loops with independent iterations on heterogeneous computer clusters have been proposed and analyzed in the past. Here, we implement the previous schemes in CORBA (Orbix). We also present an extension of these schemes implemented in a hierarchical master-slave architecture. We present experimental results and comparisons.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83783070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}