后绑定覆盖中的分布式调度和数据共享

2014 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2014-07-21 DOI:10.1109/HPCSim.2014.6903678

A. D. Peris, J. Hernández, E. Huedo

{"title":"后绑定覆盖中的分布式调度和数据共享","authors":"A. D. Peris, J. Hernández, E. Huedo","doi":"10.1109/HPCSim.2014.6903678","DOIUrl":null,"url":null,"abstract":"Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"129-136"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Distributed scheduling and data sharing in late-binding overlays\",\"authors\":\"A. D. Peris, J. Hernández, E. Huedo\",\"doi\":\"10.1109/HPCSim.2014.6903678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.\",\"PeriodicalId\":6469,\"journal\":{\"name\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"5 1\",\"pages\":\"129-136\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2014.6903678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2014.6903678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

基于拉的后期绑定覆盖在当今一些最大的计算网格中使用。作业代理被提交给资源，其职责是在运行时从中央队列检索实际工作负载。这有助于克服这些复杂环境的问题:异质性、不精确的状态信息和相对较高的故障率。此外，后期作业分配允许动态适应网格条件或用户优先级的变化。然而，随着规模的增长，中央分配队列可能成为整个系统的瓶颈。本文介绍了一种用于延迟绑定覆盖的分布式调度体系结构，它允许执行节点构建分布式散列表，并将任务匹配和分配委托给它们，从而解决了这个问题。这减少了中央服务器上的负载，使系统更具可伸缩性和健壮性。可伸缩性使细粒度调度成为可能，并支持新功能，如在执行节点上实现分布式数据缓存，这有助于缓解通常拥塞的网格存储服务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Distributed scheduling and data sharing in late-binding overlays

Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量