Incorporating job migration and network RAM to share cluster memory resources

Proceedings the Ninth International Symposium on High-Performance Distributed Computing Pub Date : 2000-08-01 DOI:10.1109/HPDC.2000.868636

Li Xiao, Xiaodong Zhang, Stefan A. Kubricht

{"title":"Incorporating job migration and network RAM to share cluster memory resources","authors":"Li Xiao, Xiaodong Zhang, Stefan A. Kubricht","doi":"10.1109/HPDC.2000.868636","DOIUrl":null,"url":null,"abstract":"Job migrations and network RAM are two approaches for effectively using global memory resources in a workstation cluster, aimed at reducing page faults in each local workstation and improving the overall performance of cluster computing. Using either remote executions or pre-emptive migrations, a load-sharing system is able to migrate a job from a workstation without sufficient memory space to a lightly loaded workstation with a large idle memory space for the migrated job. In a network RAM system, if a job cannot find sufficient memory space for its working sets, it utilizes idle memory space from other workstations in the cluster through remote paging. Conducting trace-driven simulations, we have compared the performance and tradeoffs of the two approaches and their impacts on job execution time and cluster scalability. Job migration-based load-sharing schemes are able to balance executions of jobs in a cluster well, while network RAM is able to satisfy data-intensive jobs which may not be migratable by sharing all the idle memory resources in a cluster. A network RAM cluster of workstations is scalable only if the network is sufficiently fast. We propose an improved load-sharing scheme by combining job migrations with network RAM for cluster computing. This scheme uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a larger memory space for the job than would be available otherwise. The improved scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability for cluster computing.","PeriodicalId":400728,"journal":{"name":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2000.868636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Job migrations and network RAM are two approaches for effectively using global memory resources in a workstation cluster, aimed at reducing page faults in each local workstation and improving the overall performance of cluster computing. Using either remote executions or pre-emptive migrations, a load-sharing system is able to migrate a job from a workstation without sufficient memory space to a lightly loaded workstation with a large idle memory space for the migrated job. In a network RAM system, if a job cannot find sufficient memory space for its working sets, it utilizes idle memory space from other workstations in the cluster through remote paging. Conducting trace-driven simulations, we have compared the performance and tradeoffs of the two approaches and their impacts on job execution time and cluster scalability. Job migration-based load-sharing schemes are able to balance executions of jobs in a cluster well, while network RAM is able to satisfy data-intensive jobs which may not be migratable by sharing all the idle memory resources in a cluster. A network RAM cluster of workstations is scalable only if the network is sufficiently fast. We propose an improved load-sharing scheme by combining job migrations with network RAM for cluster computing. This scheme uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a larger memory space for the job than would be available otherwise. The improved scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability for cluster computing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合作业迁移和网络RAM来共享集群内存资源

作业迁移和网络RAM是有效利用工作站集群全局内存资源的两种方法，旨在减少每个本地工作站的页面错误，提高集群计算的整体性能。通过使用远程执行或抢占式迁移，负载共享系统能够将作业从没有足够内存空间的工作站迁移到具有大量空闲内存空间的轻负载工作站。在网络RAM系统中，如果作业无法为其工作集找到足够的内存空间，它将通过远程分页利用集群中其他工作站的空闲内存空间。通过跟踪驱动的模拟，我们比较了这两种方法的性能和权衡，以及它们对作业执行时间和集群可伸缩性的影响。基于作业迁移的负载共享方案能够很好地平衡集群中作业的执行，而网络RAM能够通过共享集群中的所有空闲内存资源来满足可能无法迁移的数据密集型作业。工作站的网络RAM集群只有在网络足够快的情况下才可扩展。我们提出了一种改进的负载共享方案，将作业迁移与网络RAM相结合用于集群计算。该方案使用远程执行将作业初始分配给负载最轻的工作站，如果有必要，还使用网络RAM为作业提供比其他方式更大的内存空间。改进方案具有作业迁移和网络内存的优点。实验证明了该算法在集群计算中的有效性和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings the Ninth International Symposium on High-Performance Distributed Computing

自引率

0.00%

发文量

期刊最新文献

Event services for high performance computing The Modeler's Workbench: a system for dynamically distributed simulation and data collection Probe - a distributed storage testbed Grid-based file access: the Legion I/O model Creating large scale database servers