首页 > 最新文献

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing最新文献

英文 中文
Polyphony: A Workflow Orchestration Framework for Cloud Computing 复调:用于云计算的工作流编排框架
K. Shams, M. Powell, T. Crockett, J. Norris, Ryan A. Rossi, T. Söderström
Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.
云计算以可承受的价格为NASA的任务提供了前所未有的计算能力。像火星探测漫游者(MER)和火星科学实验室(MSL)这样的任务正在享受弹性,这使他们能够在短时间内利用数百台(如果不是数千台的话)机器,而无需购买任何硬件。在本文中,我们描述了Polyphony,这是一个弹性的、可扩展的、模块化的框架,它有效地利用了大量的计算资源来执行并行计算。Polyphony可以使用云上的资源、本地机器上的过剩容量以及超级计算中心上的备用资源,并且它使这些资源能够协同工作以实现共同的目标。Polyphony对节点故障具有弹性,即使它们发生在事务的中间。最后,我们将评估构建在Polyphony之上的生产就绪应用程序,以执行来自太阳系周围的图像处理操作,包括火星,土星和土卫六。
{"title":"Polyphony: A Workflow Orchestration Framework for Cloud Computing","authors":"K. Shams, M. Powell, T. Crockett, J. Norris, Ryan A. Rossi, T. Söderström","doi":"10.1109/CCGRID.2010.117","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.117","url":null,"abstract":"Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132460209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
SLA-Driven Dynamic Resource Management for Multi-tier Web Applications in a Cloud 云中多层Web应用的sla驱动动态资源管理
Waheed Iqbal, M. Dailey, David Carrera
Current service-level agreements (SLAs) offered by cloud providers do not make guarantees about response time of Web applications hosted on the cloud. Satisfying a maximum average response time guarantee for Web applications is difficult due to unpredictable traffic patterns. The complex nature of multi-tier Web applications increases the difficulty of identifying bottlenecks and resolving them automatically. It may be possible to minimize the probability that tiers (hosted on virtual machines) become bottlenecks by optimizing the placement of the virtual machines in a cloud. This research focuses on enabling clouds to offer multi-tier Web application owners maximum response time guarantees while minimizing resource utilization. We present our basic approach, preliminary experiments, and results on a EUCALYPTUS-based testbed cloud. Our preliminary results shows that dynamic bottleneck detection and resolution for multi-tier Web application hosted on the cloud will help to offer SLAs that can offer response time guarantees.
云提供商提供的当前服务水平协议(sla)不能保证托管在云上的Web应用程序的响应时间。由于不可预测的流量模式,很难保证Web应用程序的最大平均响应时间。多层Web应用程序的复杂性增加了识别瓶颈和自动解决瓶颈的难度。通过优化云中的虚拟机的位置,可以最小化层(托管在虚拟机上)成为瓶颈的可能性。这项研究的重点是使云能够为多层Web应用程序所有者提供最大的响应时间保证,同时最大限度地减少资源利用率。我们介绍了我们的基本方法,初步实验,以及基于桉树的测试平台云的结果。我们的初步结果表明,托管在云上的多层Web应用程序的动态瓶颈检测和解决将有助于提供能够提供响应时间保证的sla。
{"title":"SLA-Driven Dynamic Resource Management for Multi-tier Web Applications in a Cloud","authors":"Waheed Iqbal, M. Dailey, David Carrera","doi":"10.1109/CCGRID.2010.59","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.59","url":null,"abstract":"Current service-level agreements (SLAs) offered by cloud providers do not make guarantees about response time of Web applications hosted on the cloud. Satisfying a maximum average response time guarantee for Web applications is difficult due to unpredictable traffic patterns. The complex nature of multi-tier Web applications increases the difficulty of identifying bottlenecks and resolving them automatically. It may be possible to minimize the probability that tiers (hosted on virtual machines) become bottlenecks by optimizing the placement of the virtual machines in a cloud. This research focuses on enabling clouds to offer multi-tier Web application owners maximum response time guarantees while minimizing resource utilization. We present our basic approach, preliminary experiments, and results on a EUCALYPTUS-based testbed cloud. Our preliminary results shows that dynamic bottleneck detection and resolution for multi-tier Web application hosted on the cloud will help to offer SLAs that can offer response time guarantees.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133807767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Development and Support of Platforms for Research into Rare Diseases 罕见病研究平台的开发与支持
R. Sinnott, Jipu Jiang, A. Stell, J. Watt
The technologies and ideas that underlie e-Science in providing seamless access to distributed resources is a compelling one and has been applied in many research domains. The clinical domain is one area in particular that, in principle has much to be gained from e-Science approaches. Until now however it has largely been the case that the practical realization, support and adoption of e-Science solutions in a clinical setting have been fraught by many hurdles. Not least is trust of technologies and their use in the field as opposed to demonstrator projects with non-real clinical data to prove the merit of e-Science ideas and solutions. The National e-Science Centre (NeSC– www.nesc.ac.uk) at the University of Glasgow have had a large number of clinical projects that have moved from the proof of concept demonstrators through to real systems used by real clinical researchers in real clinical trials and studies. In this paper we focus on the software systems that have been developed to support two major international post-genomic clinical research projects in the area of rare diseases: the European Union 7th Framework (EuroDSD – www.eurodsd.eu) project and the European Science Foundation (ENSAT – www.ensat.org) project. We outline the software platforms that have been rolled out and identify how the e-Science vision of secure access to clinical resources has been realized and subsequently used.
e-Science提供对分布式资源的无缝访问的基础技术和思想是一个引人注目的技术,并已应用于许多研究领域。临床领域是一个特别的领域,原则上可以从电子科学方法中获得很多好处。然而,到目前为止,在临床环境中实际实现、支持和采用电子科学解决方案在很大程度上是充满了许多障碍的。最重要的是对技术及其在该领域的使用的信任,而不是用非真实临床数据来证明电子科学思想和解决方案的优点的示范项目。格拉斯哥大学的国家电子科学中心(NeSC - www.nesc.ac.uk)已经有了大量的临床项目,这些项目已经从概念验证演示转移到真正的临床研究人员在真正的临床试验和研究中使用的实际系统。在本文中,我们重点介绍了为支持罕见病领域的两个主要国际后基因组临床研究项目而开发的软件系统:欧盟第七框架(EuroDSD - www.eurodsd.eu)项目和欧洲科学基金会(ENSAT - www.ensat.org)项目。我们概述了已经推出的软件平台,并确定了如何实现和随后使用安全访问临床资源的电子科学愿景。
{"title":"Development and Support of Platforms for Research into Rare Diseases","authors":"R. Sinnott, Jipu Jiang, A. Stell, J. Watt","doi":"10.1109/CCGRID.2010.127","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.127","url":null,"abstract":"The technologies and ideas that underlie e-Science in providing seamless access to distributed resources is a compelling one and has been applied in many research domains. The clinical domain is one area in particular that, in principle has much to be gained from e-Science approaches. Until now however it has largely been the case that the practical realization, support and adoption of e-Science solutions in a clinical setting have been fraught by many hurdles. Not least is trust of technologies and their use in the field as opposed to demonstrator projects with non-real clinical data to prove the merit of e-Science ideas and solutions. The National e-Science Centre (NeSC– www.nesc.ac.uk) at the University of Glasgow have had a large number of clinical projects that have moved from the proof of concept demonstrators through to real systems used by real clinical researchers in real clinical trials and studies. In this paper we focus on the software systems that have been developed to support two major international post-genomic clinical research projects in the area of rare diseases: the European Union 7th Framework (EuroDSD – www.eurodsd.eu) project and the European Science Foundation (ENSAT – www.ensat.org) project. We outline the software platforms that have been rolled out and identify how the e-Science vision of secure access to clinical resources has been realized and subsequently used.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Running the NIM Next-Generation Weather Model on GPUs 在gpu上运行NIM下一代天气模型
M. Govett, J. Middlecoff, T. Henderson
We are using GPUs to run a new weather model being developed at NOAA’s Earth System Research Laboratory (ESRL). The parallelization approach is to run the entire model on the GPU and only rely on the CPU for model initialization, I/O, and inter-processor communications. We have written a compiler to convert Fortran into CUDA, and used it to parallelize the dynamics portion of the model. Dynamics, the most computationally intensive part of the model, is currently running 34 times faster on a single GPU than the CPU. We also describe our approach and progress to date in running NIM on multiple GPUs.
我们正在使用gpu运行NOAA地球系统研究实验室(ESRL)正在开发的一个新的天气模型。并行化方法是在GPU上运行整个模型,并且只依赖CPU进行模型初始化、I/O和处理器间通信。我们编写了一个编译器来将Fortran转换为CUDA,并使用它来并行化模型的动态部分。动态是模型中计算最密集的部分,目前在单个GPU上的运行速度比CPU快34倍。我们还描述了迄今为止在多个gpu上运行NIM的方法和进展。
{"title":"Running the NIM Next-Generation Weather Model on GPUs","authors":"M. Govett, J. Middlecoff, T. Henderson","doi":"10.1109/CCGRID.2010.106","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.106","url":null,"abstract":"We are using GPUs to run a new weather model being developed at NOAA’s Earth System Research Laboratory (ESRL). The parallelization approach is to run the entire model on the GPU and only rely on the CPU for model initialization, I/O, and inter-processor communications. We have written a compiler to convert Fortran into CUDA, and used it to parallelize the dynamics portion of the model. Dynamics, the most computationally intensive part of the model, is currently running 34 times faster on a single GPU than the CPU. We also describe our approach and progress to date in running NIM on multiple GPUs.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130278035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Granularity-Aware Work-Stealing for Computationally-Uniform Grids 计算均匀网格的粒度感知工作窃取
Vladimir Janjic, K. Hammond
Good scheduling is important for ensuring effective use of Grid resources, while maximising parallel performance. In this paper, we show how a basic ``Random-Stealing'' load balancing algorithm for computational Grids can be improved by using information about the task granularity of parallel programs. We propose several strategies (SSL, SLL and LLL) for using granularity information to improve load balancing, presenting results both from simulations and from a real implementation (the Grid-GUM Runtime System for Parallel Haskell). We assume a common model of task creation which subsumes both master/worker and data-parallel programming paradigms under a task-stealing work distribution strategy. Overall, we achieve improvement in runtime of up to 19.4% for irregular problems in the real implementation, and up to 40% for the simulations (typical improvements of more that 15% for irregular programs, and from 5-10% for regular ones). Our results show that, for computationally-uniform Grids, advanced load balancing methods that exploit granularity information generally have the greatest impact on reducing the runtimes of irregular parallel programs. Moreover, the more irregular the program is, the better the improvements that can be achieved.
良好的调度对于确保有效使用网格资源,同时最大化并行性能非常重要。在本文中,我们展示了如何通过使用并行程序的任务粒度信息来改进计算网格的基本“随机窃取”负载平衡算法。我们提出了几种策略(SSL, SLL和LLL)来使用粒度信息来改善负载平衡,并给出了模拟和实际实现(Grid-GUM Runtime System for Parallel Haskell)的结果。我们假设了一个通用的任务创建模型,该模型包含了在任务窃取工作分配策略下的主/worker和数据并行编程范式。总的来说,我们在实际实现中不规则问题的运行时改进了19.4%,在模拟中提高了40%(不规则程序的典型改进超过15%,常规程序的典型改进为5-10%)。我们的研究结果表明,对于计算均匀的网格,利用粒度信息的高级负载平衡方法通常对减少不规则并行程序的运行时间有最大的影响。而且,程序越不规则,改进效果越好。
{"title":"Granularity-Aware Work-Stealing for Computationally-Uniform Grids","authors":"Vladimir Janjic, K. Hammond","doi":"10.1109/CCGRID.2010.49","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.49","url":null,"abstract":"Good scheduling is important for ensuring effective use of Grid resources, while maximising parallel performance. In this paper, we show how a basic ``Random-Stealing'' load balancing algorithm for computational Grids can be improved by using information about the task granularity of parallel programs. We propose several strategies (SSL, SLL and LLL) for using granularity information to improve load balancing, presenting results both from simulations and from a real implementation (the Grid-GUM Runtime System for Parallel Haskell). We assume a common model of task creation which subsumes both master/worker and data-parallel programming paradigms under a task-stealing work distribution strategy. Overall, we achieve improvement in runtime of up to 19.4% for irregular problems in the real implementation, and up to 40% for the simulations (typical improvements of more that 15% for irregular programs, and from 5-10% for regular ones). Our results show that, for computationally-uniform Grids, advanced load balancing methods that exploit granularity information generally have the greatest impact on reducing the runtimes of irregular parallel programs. Moreover, the more irregular the program is, the better the improvements that can be achieved.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128955157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices 票价:VMM-Bypass ib设备的公平资源调度
A. Ranadive, Ada Gavrilovska, K. Schwan
In order to address the high performance I/O needs of HPC and enterprise applications, modern interconnection fabrics, such as InfiniBand and more recently, 10GigE, rely on network adapters with RDMA capabilities. In virtualized environments, these types of adapters are configured in a manner that bypasses the hypervisor and allows virtual machines (VMs) direct device access, so that they deliver near-native low-latency/high-bandwidth I/O. One challenge with the bypass approach is that it causes the hypervisor to lose control over VM-device interactions, including the ability to monitor such interactions and to ensure fair resource usage by VMs. Fairness violations, however, permit low-priority VMs to affect the I/O allocations of other higher priority VMs and more generally, lack of supervision can lead to inefficiencies in the usage of platform resources. This paper describes the FaReS system-level mechanisms for monitoring VMs' usage of bypass I/O devices. Monitoring information acquired with FaReS is then used to adjust VMM-level scheduling in order to improve resource utilization and/or ensure fairness properties across the sets of VMs sharing platform resources. FaReS employs a memory introspection-based tool for asynchronously monitoring VMM-bypass devices, using InfiniBand HCAs as a concrete example. FaReS and its very low overhead (
为了满足高性能pc和企业应用的高性能I/O需求,现代互连结构,如InfiniBand和最近的10GigE,依赖于具有RDMA功能的网络适配器。在虚拟化环境中,这些类型的适配器以绕过管理程序并允许虚拟机(vm)直接访问设备的方式进行配置,以便它们提供接近本机的低延迟/高带宽I/O。绕过方法的一个挑战是,它会导致管理程序失去对vm -设备交互的控制,包括监视此类交互和确保vm公平使用资源的能力。然而,违反公平性会导致低优先级vm影响其他高优先级vm的I/O分配,更普遍的是,缺乏监督会导致平台资源使用效率低下。本文描述了用于监控虚拟机使用旁路I/O设备的FaReS系统级机制。然后使用FaReS获取的监控信息来调整vmm级别的调度,以提高资源利用率和/或确保共享平台资源的vm集之间的公平性。FaReS采用基于内存自省的工具来异步监控vmm旁路设备,并以InfiniBand hca为具体示例。票价和费用都很低(
{"title":"FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices","authors":"A. Ranadive, Ada Gavrilovska, K. Schwan","doi":"10.1109/CCGRID.2010.11","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.11","url":null,"abstract":"In order to address the high performance I/O needs of HPC and enterprise applications, modern interconnection fabrics, such as InfiniBand and more recently, 10GigE, rely on network adapters with RDMA capabilities. In virtualized environments, these types of adapters are configured in a manner that bypasses the hypervisor and allows virtual machines (VMs) direct device access, so that they deliver near-native low-latency/high-bandwidth I/O. One challenge with the bypass approach is that it causes the hypervisor to lose control over VM-device interactions, including the ability to monitor such interactions and to ensure fair resource usage by VMs. Fairness violations, however, permit low-priority VMs to affect the I/O allocations of other higher priority VMs and more generally, lack of supervision can lead to inefficiencies in the usage of platform resources. This paper describes the FaReS system-level mechanisms for monitoring VMs' usage of bypass I/O devices. Monitoring information acquired with FaReS is then used to adjust VMM-level scheduling in order to improve resource utilization and/or ensure fairness properties across the sets of VMs sharing platform resources. FaReS employs a memory introspection-based tool for asynchronously monitoring VMM-bypass devices, using InfiniBand HCAs as a concrete example. FaReS and its very low overhead (","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122935769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Feedback-Guided Analysis for Resource Requirements in Large Distributed System 大型分布式系统资源需求的反馈导向分析
M. Sarkar, Sarbani Roy, N. Mukherjee
Resource management is one of the focus areas of Grid which identifies Job Modeling to be a very important part of it. A proper Job Modeling can be helpful in allocating jobs to their most suitable resource providers in Grid. This paper presents a feedback-guided Automatic Job Modeling technique that describes the process required to identify the most suitable resource provider for a particular job.
资源管理是网格的重点领域之一,而任务建模是网格的重要组成部分。适当的作业建模有助于将作业分配给网格中最合适的资源提供者。本文提出了一种反馈引导的自动作业建模技术,该技术描述了为特定作业识别最合适的资源提供者所需的过程。
{"title":"Feedback-Guided Analysis for Resource Requirements in Large Distributed System","authors":"M. Sarkar, Sarbani Roy, N. Mukherjee","doi":"10.1109/CCGRID.2010.90","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.90","url":null,"abstract":"Resource management is one of the focus areas of Grid which identifies Job Modeling to be a very important part of it. A proper Job Modeling can be helpful in allocating jobs to their most suitable resource providers in Grid. This paper presents a feedback-guided Automatic Job Modeling technique that describes the process required to identify the most suitable resource provider for a particular job.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132943325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Analysis of Traces from a Production MapReduce Cluster 生产MapReduce集群轨迹分析
Soila Kavulya, Jiaqi Tan, R. Gandhi, P. Narasimhan
MapReduce is a programming paradigm for parallel processing that is increasingly being used for data-intensive applications in cloud computing environments. An understanding of the characteristics of workloads running in MapReduce environments benefits both the service providers in the cloud and users: the service provider can use this knowledge to make better scheduling decisions, while the user can learn what aspects of their jobs impact performance. This paper analyzes 10-months of MapReduce logs from the M45 supercomputing cluster which Yahoo! made freely available to select universities for academic research. We characterize resource utilization patterns, job patterns, and sources of failures. We use an instance-based learning technique that exploits temporal locality to predict job completion times from historical data and identify potential performance problems in our dataset.
MapReduce是一种用于并行处理的编程范例,越来越多地用于云计算环境中的数据密集型应用程序。了解在MapReduce环境中运行的工作负载的特征对云中的服务提供商和用户都有好处:服务提供商可以使用这些知识来做出更好的调度决策,而用户可以了解他们的作业影响性能的哪些方面。本文分析了M45超级计算集群10个月的MapReduce日志。免费提供给选定的大学进行学术研究。我们描述了资源利用模式、作业模式和故障来源。我们使用基于实例的学习技术,利用时间局部性从历史数据中预测作业完成时间,并识别数据集中潜在的性能问题。
{"title":"An Analysis of Traces from a Production MapReduce Cluster","authors":"Soila Kavulya, Jiaqi Tan, R. Gandhi, P. Narasimhan","doi":"10.1109/CCGRID.2010.112","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.112","url":null,"abstract":"MapReduce is a programming paradigm for parallel processing that is increasingly being used for data-intensive applications in cloud computing environments. An understanding of the characteristics of workloads running in MapReduce environments benefits both the service providers in the cloud and users: the service provider can use this knowledge to make better scheduling decisions, while the user can learn what aspects of their jobs impact performance. This paper analyzes 10-months of MapReduce logs from the M45 supercomputing cluster which Yahoo! made freely available to select universities for academic research. We characterize resource utilization patterns, job patterns, and sources of failures. We use an instance-based learning technique that exploits temporal locality to predict job completion times from historical data and identify potential performance problems in our dataset.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132393341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 354
Design and Implementation of an Efficient Two-Level Scheduler for Cloud Computing Environment 云计算环境下高效两级调度程序的设计与实现
R. Jeyarani, R. Ram, N. Nagaveni
Cloud computing focuses on delivery of reliable, fault-tolerant and scalable infrastructure for hosting Internet based application services. Our work presents the implementation of an efficient Quality of Service (QoS) based meta-scheduler and Backfill strategy based light weight Virtual Machine Scheduler for dispatching jobs. The user centric meta-scheduler deals with selection of proper resources to execute high level jobs. The system centric Virtual Machine (VM) scheduler optimally dispatches the jobs to processors for better resource utilization. We also present our proposals on scheduling heuristics that can be incorporated at data center level for selecting ideal host for VM creation. The implementation can be further extended at the host level, using Inter VM scheduler for adaptive load balancing in cloud environment.
云计算侧重于为托管基于Internet的应用程序服务提供可靠、容错和可扩展的基础设施。我们的工作提出了一个高效的基于服务质量(QoS)的元调度程序和基于回填策略的轻量级虚拟机调度程序的实现,用于调度作业。以用户为中心的元调度程序负责选择适当的资源来执行高级作业。以系统为中心的虚拟机(VM)调度器以最佳方式将作业分派给处理器,以更好地利用资源。我们还提出了关于调度启发式的建议,这些建议可以纳入数据中心级别,以选择理想的虚拟机创建主机。该实现可以在主机级别进一步扩展,使用Inter VM调度器在云环境中实现自适应负载平衡。
{"title":"Design and Implementation of an Efficient Two-Level Scheduler for Cloud Computing Environment","authors":"R. Jeyarani, R. Ram, N. Nagaveni","doi":"10.1109/CCGRID.2010.94","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.94","url":null,"abstract":"Cloud computing focuses on delivery of reliable, fault-tolerant and scalable infrastructure for hosting Internet based application services. Our work presents the implementation of an efficient Quality of Service (QoS) based meta-scheduler and Backfill strategy based light weight Virtual Machine Scheduler for dispatching jobs. The user centric meta-scheduler deals with selection of proper resources to execute high level jobs. The system centric Virtual Machine (VM) scheduler optimally dispatches the jobs to processors for better resource utilization. We also present our proposals on scheduling heuristics that can be incorporated at data center level for selecting ideal host for VM creation. The implementation can be further extended at the host level, using Inter VM scheduler for adaptive load balancing in cloud environment.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134338898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems SAGA BigJob:分布式应用和系统的可扩展和可互操作的试点工作抽象
André Luckow, Lukasz Lacinski, S. Jha
The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.
科学应用程序对分布式基础设施的吸收受到可扩展、普遍和易于使用的抽象的可用性的限制,这些抽象在多个层次(科学应用程序的开发、部署和执行阶段)都是必需的。试点工作抽象已被证明是一种有效的抽象,可以解决科学应用的许多需求。具体来说,Pilot-Jobs支持将工作负载提交与资源分配解耦,从而产生灵活的执行模型,从而支持应用程序在多个(可能是异构的)资源上的分布式扩展。然而,大多数试点作业的实现都与特定的基础设施相关联。在本文中,我们描述了基于saga的Pilot-Job的设计和实现,它支持广泛的应用程序类型,并且可以在广泛的基础设施上使用,也就是说,它是通用的和可扩展的,并且正如我们将讨论的那样,它还可以与云互操作。我们讨论了如何将基于saga的Pilot-Job用于不同的应用程序类型,以及如何支持跨多个异构分布式基础设施的并发使用,包括跨云和传统网格/集群的并发使用。此外,我们将展示Pilot-Jobs如何帮助支持动态执行模型,从而为分布式应用程序引入新的机会。我们还首次演示了使用多个Pilot-Job实现来解决相同的问题,具体来说,我们在高端资源(如TeraGrid)上使用基于saga的Pilot-Job,在Condor资源上使用本地Condor Pilot-Job(滑翔)。重要的是,两者都可以通过相同的接口调用,而无需在开发或部署级别进行更改,而只需执行(运行时)决策。
{"title":"SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems","authors":"André Luckow, Lukasz Lacinski, S. Jha","doi":"10.1109/CCGRID.2010.91","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.91","url":null,"abstract":"The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels -- development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment, this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem, specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123193505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
期刊
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1