2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)最新文献_第2页

Low Latency and Resource-Aware Program Composition for Large-Scale Data Analysis 面向大规模数据分析的低延迟和资源感知程序组合

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.88

Masahiro Tanaka, K. Taura, Kentaro Torisawa

The importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typically reuses existing programs as components and is often required to continuously process new data with low latency while processing large-scale data on distributed computation nodes. However, existing frameworks for combining programs into a parallel data analysis pipeline (e.g., workflow) are plagued by the following issues: (1) Most frameworks are oriented toward high-throughput batch processing, which leads to high latency. (2) A specific language is often imposed for the composition and/or such a specific structure as a simple unidirectional dataflow among constituting tasks. (3) A program used as a component often takes a long time to start up due to the heavy load at initialization, which is referred to as the startup overhead. Our solution to these problems is a remote procedure call (RPC)-based composition, which is achieved by our middleware Rapid Service Connector (RaSC). RaSC can easily wrap an ordinary program and make it accessible as an RPC service, called a RaSC service. Using such component programs as RaSC services enables us to integrate them into one program with low latency without being restricted to a specific workflow language or dataflow structure. In addition, a RaSC service masks the startup overhead of a component program by keeping the processes of the component program alive across RPC requests. We also proposed architecture that automatically manages the number of processes to maximize the throughput. The experimental results showed that our approach excels in overall throughput as well as latency, despite its RPC overhead. We also showed that our approach can adapt to runtime changes in the throughput requirements.

大规模数据分析的重要性最近在自然语言处理、传感器数据分析和科学计算等各个领域都有所增加。这样的分析应用程序通常重用现有程序作为组件，并且通常需要在分布式计算节点上处理大规模数据时以低延迟连续处理新数据。然而，现有的将程序组合成并行数据分析管道(例如工作流)的框架受到以下问题的困扰:(1)大多数框架都面向高吞吐量批处理，这导致了高延迟。(2)对于组成和/或构成任务之间的简单单向数据流这样的特定结构，通常强加一种特定的语言。(3)作为组件使用的程序，由于初始化时负载较大，往往需要较长时间才能启动，这称为启动开销。我们对这些问题的解决方案是基于远程过程调用(RPC)的组合，这是由中间件快速服务连接器(RaSC)实现的。RaSC可以很容易地包装普通程序，并使其作为RPC服务(称为RaSC服务)访问。使用这些组件程序作为RaSC服务使我们能够将它们集成到一个具有低延迟的程序中，而不受特定工作流语言或数据流结构的限制。此外，RaSC服务通过在RPC请求之间保持组件程序的进程活动来掩盖组件程序的启动开销。我们还提出了自动管理进程数量以最大化吞吐量的体系结构。实验结果表明，尽管存在RPC开销，但我们的方法在总体吞吐量和延迟方面都表现出色。我们还展示了我们的方法可以适应吞吐量需求的运行时变化。

{"title":"Low Latency and Resource-Aware Program Composition for Large-Scale Data Analysis","authors":"Masahiro Tanaka, K. Taura, Kentaro Torisawa","doi":"10.1109/CCGrid.2016.88","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.88","url":null,"abstract":"The importance of large-scale data analysis has shown a recent increase in a wide variety of areas, such as natural language processing, sensor data analysis, and scientific computing. Such an analysis application typically reuses existing programs as components and is often required to continuously process new data with low latency while processing large-scale data on distributed computation nodes. However, existing frameworks for combining programs into a parallel data analysis pipeline (e.g., workflow) are plagued by the following issues: (1) Most frameworks are oriented toward high-throughput batch processing, which leads to high latency. (2) A specific language is often imposed for the composition and/or such a specific structure as a simple unidirectional dataflow among constituting tasks. (3) A program used as a component often takes a long time to start up due to the heavy load at initialization, which is referred to as the startup overhead. Our solution to these problems is a remote procedure call (RPC)-based composition, which is achieved by our middleware Rapid Service Connector (RaSC). RaSC can easily wrap an ordinary program and make it accessible as an RPC service, called a RaSC service. Using such component programs as RaSC services enables us to integrate them into one program with low latency without being restricted to a specific workflow language or dataflow structure. In addition, a RaSC service masks the startup overhead of a component program by keeping the processes of the component program alive across RPC requests. We also proposed architecture that automatically manages the number of processes to maximize the throughput. The experimental results showed that our approach excels in overall throughput as well as latency, despite its RPC overhead. We also showed that our approach can adapt to runtime changes in the throughput requirements.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126171559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Dynamic Adaptation of Policies Using Machine Learning 使用机器学习的策略动态适应

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.64

Alejandro Pelaez, Andres Quiroz, M. Parashar

Managing large systems in order to guarantee certain behavior is a difficult problem due to their dynamic behavior and complex interactions. Policies have been shown to provide a very expressive and easy way to define such desired behaviors, mainly because they separate the definition of desired behavior from the enforcement mechanism, allowing either one to be changed fairly easily. Unfortunately, it is often difficult to define policies in terms of attributes that can be measured and/or directly controlled, or to set adaptable (i.e. non-static) parameters in order to account for rapidly changing system behavior. Dynamic policies are meant to solve these problems by allowing system administrators to define higher level parameters, which are more closely related to the business goals, while providing an automated mechanism to adapt them at a lower level, where attributes can be measured and/or controlled. Here, we present a way to define such policies, and a machine learning model that is able to dynamically apply lower level static policies by learning a hidden relationship between the high level business attribute space, and the low level monitoring space. We show that this relationship exists, and that we can learn it producing an error of at most 8.78% at least 96% of the time.

由于大型系统的动态行为和复杂的相互作用，管理大型系统以保证其特定的行为是一个难题。策略提供了一种非常容易表达的方法来定义这些期望的行为，主要是因为它们将期望的行为的定义与执行机制分开，允许相当容易地更改其中任何一个。不幸的是，通常很难根据可测量和/或直接控制的属性来定义策略，或者为了解释快速变化的系统行为而设置可适应的(即非静态的)参数。动态策略旨在通过允许系统管理员定义与业务目标更密切相关的高级参数来解决这些问题，同时提供自动化机制来在较低级别调整它们，在较低级别可以测量和/或控制属性。在这里，我们提出了一种定义此类策略的方法，以及一个机器学习模型，该模型能够通过学习高级业务属性空间和低级监视空间之间的隐藏关系来动态应用低级静态策略。我们证明了这种关系存在，并且我们可以学习它，至少96%的时间产生最多8.78%的误差。

{"title":"Dynamic Adaptation of Policies Using Machine Learning","authors":"Alejandro Pelaez, Andres Quiroz, M. Parashar","doi":"10.1109/CCGrid.2016.64","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.64","url":null,"abstract":"Managing large systems in order to guarantee certain behavior is a difficult problem due to their dynamic behavior and complex interactions. Policies have been shown to provide a very expressive and easy way to define such desired behaviors, mainly because they separate the definition of desired behavior from the enforcement mechanism, allowing either one to be changed fairly easily. Unfortunately, it is often difficult to define policies in terms of attributes that can be measured and/or directly controlled, or to set adaptable (i.e. non-static) parameters in order to account for rapidly changing system behavior. Dynamic policies are meant to solve these problems by allowing system administrators to define higher level parameters, which are more closely related to the business goals, while providing an automated mechanism to adapt them at a lower level, where attributes can be measured and/or controlled. Here, we present a way to define such policies, and a machine learning model that is able to dynamically apply lower level static policies by learning a hidden relationship between the high level business attribute space, and the low level monitoring space. We show that this relationship exists, and that we can learn it producing an error of at most 8.78% at least 96% of the time.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129763499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Infrastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures 使用AWS Lambda和单片和微服务架构在云中运行Web应用程序的基础设施成本比较

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.37

Mario Villamizar, Oscar Garces, Lina Ochoa, Harold E. Castro, Lorena Salamanca, Mauricio Verano, R. Casallas, Santiago Gil, Carlos Valencia, Angee Zambrano, Mery Lang

Large Internet companies like Amazon, Netflix, and LinkedIn are using the microservice architecture pattern to deploy large applications in the cloud as a set of small services that can be developed, tested, deployed, scaled, operated and upgraded independently. However, aside from gaining agility, independent development, and scalability, infrastructure costs are a major concern for companies adopting this pattern. This paper presents a cost comparison of a web application developed and deployed using the same scalable scenarios with three different approaches: 1) a monolithic architecture, 2) a microservice architecture operated by the cloud customer, and 3) a microservice architecture operated by the cloud provider. Test results show that microservices can help reduce infrastructure costs in comparison to standard monolithic architectures. Moreover, the use of services specifically designed to deploy and scale microservices reduces infrastructure costs by 70% or more. Lastly, we also describe the challenges we faced while implementing and deploying microservice applications.

像Amazon、Netflix和LinkedIn这样的大型互联网公司正在使用微服务架构模式在云中部署大型应用程序，作为一组可以独立开发、测试、部署、扩展、操作和升级的小型服务。然而，除了获得敏捷性、独立开发和可伸缩性之外，基础设施成本是采用这种模式的公司的主要关注点。本文介绍了使用三种不同方法的相同可扩展场景开发和部署web应用程序的成本比较:1)单片架构，2)由云客户操作的微服务架构，以及3)由云提供商操作的微服务架构。测试结果表明，与标准的单片架构相比，微服务可以帮助降低基础架构成本。此外，使用专门设计用于部署和扩展微服务的服务可以将基础设施成本降低70%或更多。最后，我们还描述了在实现和部署微服务应用程序时所面临的挑战。

{"title":"Infrastructure Cost Comparison of Running Web Applications in the Cloud Using AWS Lambda and Monolithic and Microservice Architectures","authors":"Mario Villamizar, Oscar Garces, Lina Ochoa, Harold E. Castro, Lorena Salamanca, Mauricio Verano, R. Casallas, Santiago Gil, Carlos Valencia, Angee Zambrano, Mery Lang","doi":"10.1109/CCGrid.2016.37","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.37","url":null,"abstract":"Large Internet companies like Amazon, Netflix, and LinkedIn are using the microservice architecture pattern to deploy large applications in the cloud as a set of small services that can be developed, tested, deployed, scaled, operated and upgraded independently. However, aside from gaining agility, independent development, and scalability, infrastructure costs are a major concern for companies adopting this pattern. This paper presents a cost comparison of a web application developed and deployed using the same scalable scenarios with three different approaches: 1) a monolithic architecture, 2) a microservice architecture operated by the cloud customer, and 3) a microservice architecture operated by the cloud provider. Test results show that microservices can help reduce infrastructure costs in comparison to standard monolithic architectures. Moreover, the use of services specifically designed to deploy and scale microservices reduces infrastructure costs by 70% or more. Lastly, we also describe the challenges we faced while implementing and deploying microservice applications.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127357529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 159

Creating Soft Heterogeneity in Clusters Through Firmware Re-configuration 通过重新配置固件实现集群软异构

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.92

Xin Zhan, M. Shoaib, S. Reda

Customizing server hardware to adapt to its workload has the potential to improve both runtime and energy efficiency. In a cluster that caters to diverse workloads, employing servers with customized hardware components leads to heterogeneity, which is not scalable. In this paper, we seek to create soft heterogeneity from existing servers with homogenous hardware components through customizing the firmware configuration. We demonstrate that firmware configurations have a large impact on runtime, power, and energy efficiency of workloads. Since finding the firmware configuration that minimizes runtime and/or energy efficiency grows exponentially as a function of the number of firmware settings, we propose a methodology called FXplore that helps complete the exploration with a quadratic time complexity. Furthermore, FXplore enables system administrators to manage the degree of the heterogeneity by deriving firmware configurations for sub-clusters that can cater to multiple workloads with similar characteristics. Thus, during online operation, incoming workloads to the cluster can be mapped to appropriate sub-clusters with pre-configured firmware settings. FXplore also finds the best firmware settings in case of co-runners on the same server. We validate our methodology on a fully-instrumented cluster under a large range of parallel workloads that are representative of both high-performance compute clusters and datacenters. Compared to enabling all firmware options, our method improves average runtime and energy consumption by 11% and 15%, respectively.

定制服务器硬件以适应其工作负载有可能提高运行时和能源效率。在满足不同工作负载的集群中，使用带有定制硬件组件的服务器会导致异构性，这是不可扩展的。在本文中，我们试图通过自定义固件配置，从具有同质硬件组件的现有服务器创建软异构。我们演示了固件配置对工作负载的运行时、电源和能源效率有很大的影响。由于寻找最大限度地减少运行时间和/或能源效率的固件配置作为固件设置数量的函数呈指数增长，因此我们提出了一种称为FXplore的方法，该方法有助于以二次的时间复杂度完成探索。此外，FXplore使系统管理员能够通过派生子集群的固件配置来管理异构程度，这些子集群可以满足具有相似特征的多个工作负载。因此，在在线操作期间，可以将传入集群的工作负载映射到具有预配置固件设置的适当子集群。FXplore还可以在同一服务器上的共同运行程序的情况下找到最佳固件设置。我们在一个完全仪器化的集群上验证了我们的方法，该集群在大量并行工作负载下，代表高性能计算集群和数据中心。与启用所有固件选项相比，我们的方法将平均运行时间和能耗分别提高11%和15%。

{"title":"Creating Soft Heterogeneity in Clusters Through Firmware Re-configuration","authors":"Xin Zhan, M. Shoaib, S. Reda","doi":"10.1109/CCGrid.2016.92","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.92","url":null,"abstract":"Customizing server hardware to adapt to its workload has the potential to improve both runtime and energy efficiency. In a cluster that caters to diverse workloads, employing servers with customized hardware components leads to heterogeneity, which is not scalable. In this paper, we seek to create soft heterogeneity from existing servers with homogenous hardware components through customizing the firmware configuration. We demonstrate that firmware configurations have a large impact on runtime, power, and energy efficiency of workloads. Since finding the firmware configuration that minimizes runtime and/or energy efficiency grows exponentially as a function of the number of firmware settings, we propose a methodology called FXplore that helps complete the exploration with a quadratic time complexity. Furthermore, FXplore enables system administrators to manage the degree of the heterogeneity by deriving firmware configurations for sub-clusters that can cater to multiple workloads with similar characteristics. Thus, during online operation, incoming workloads to the cluster can be mapped to appropriate sub-clusters with pre-configured firmware settings. FXplore also finds the best firmware settings in case of co-runners on the same server. We validate our methodology on a fully-instrumented cluster under a large range of parallel workloads that are representative of both high-performance compute clusters and datacenters. Compared to enabling all firmware options, our method improves average runtime and energy consumption by 11% and 15%, respectively.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132098187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distem: Evaluation of Fault Tolerance and Load Balancing Strategies in Real HPC Runtimes through Emulation 系统:通过仿真评估真实HPC运行时的容错和负载均衡策略

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.35

Cristian Ruiz, Joseph Emeras, E. Jeanvoine, L. Nussbaum

The era of Exascale computing raises new challenges for HPC. Intrinsic characteristics of those extreme scale platforms bring energy and reliability issues. To cope with those constraints, applications will have to be more flexible in order to deal with platform geometry evolutions and unavoidable failures. Thus, to prepare for this upcoming era, a strong effort must be made on improving the HPC software stack. This work focuses on improving the study of a central part of the software stack, the HPC runtimes. To this end we propose a set of extensions to the Distem emulator that enable the evaluation of fault tolerance and load balancing mechanisms in such runtimes. Extensive experimentation showing the benefits of our approach has been performed with three HPC runtimes: Charm++, MPICH, and OpenMPI.

百亿亿次计算时代对高性能计算提出了新的挑战。这些极端规模平台的固有特性带来了能源和可靠性问题。为了应对这些限制，应用程序必须更加灵活，以应对平台的几何变化和不可避免的故障。因此，为了迎接这个即将到来的时代，必须大力改进HPC软件堆栈。这项工作的重点是改进软件堆栈的核心部分——高性能计算运行时的研究。为此，我们提出了一组系统仿真器的扩展，以便在此类运行时中评估容错和负载平衡机制。在三个HPC运行时(Charm++、MPICH和OpenMPI)上进行了大量实验，显示了我们的方法的好处。

引用次数: 0

Scheduling In-Situ Analytics in Next-Generation Applications 在下一代应用中调度原位分析

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.42

Oscar H. Mondragon, P. Bridges, Scott Levy, Kurt B. Ferreira, Patrick M. Widener

Next-generation applications increasingly rely on in situ analytics to guide computation, reduce the amount of I/O performed, and perform other important tasks. Scheduling where and when to run analytics is challenging, however. This paper quantifies the costs and benefits of different approaches to scheduling applications and analytics on nodes in large-scale applications, including space sharing, uncoordinated time sharing, and gang scheduled time sharing.

下一代应用程序越来越依赖于原位分析来指导计算，减少I/O执行量，并执行其他重要任务。然而，计划何时何地运行分析是具有挑战性的。本文量化了在大规模应用中，应用程序调度和节点分析的不同方法的成本和收益，包括空间共享、非协调分时和组调度分时。

引用次数: 7

Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace 通过集成私有IaaS和科学PaaS/SaaS市场，促进哥伦比亚HPC工作负载的执行

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.52

Harold E. Castro, Mario Villamizar, Oscar Garces, J. Perez, R. Caliz, Pedro F. Perez Arteaga

Many small and medium size research groups have limitations to execute their HPC workloads because of the need to buy, configure and maintain their own cluster or grid solutions. At the same time, some research groups have large infrastructures with low utilization levels due in part to the tools they offer to end users, that require that each end user configures complex and distributed environments. In this paper we present an effort between a private and public institution to offer scientific applications as a service taking advantage of an existing infrastructure to create a private IaaS using OpenStack, and offering scientific applications through a friendly user interface. This strategy facilitates that researchers can run their HPC workloads on a private cloud in a transparent way, hiding the complexities of distributed and scalable cloud environments. We show how this strategy may help to increase the utilization of infrastructures, how it allows end users to easily execute and share their applications through a SaaS marketplace, and how new applications can be configured and deployed using a PaaS platform.

由于需要购买、配置和维护自己的集群或网格解决方案，许多中小型研究小组在执行HPC工作负载方面受到限制。与此同时，一些研究小组拥有大型基础设施，但利用率很低，部分原因是他们为最终用户提供的工具要求每个最终用户配置复杂的分布式环境。在本文中，我们展示了私营和公共机构之间的努力，利用现有的基础设施，使用OpenStack创建私有IaaS，并通过友好的用户界面提供科学应用程序。这种策略有助于研究人员以透明的方式在私有云上运行他们的HPC工作负载，从而隐藏分布式和可扩展云环境的复杂性。我们将展示此策略如何帮助提高基础设施的利用率，它如何允许最终用户通过SaaS市场轻松地执行和共享其应用程序，以及如何使用PaaS平台配置和部署新的应用程序。

{"title":"Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace","authors":"Harold E. Castro, Mario Villamizar, Oscar Garces, J. Perez, R. Caliz, Pedro F. Perez Arteaga","doi":"10.1109/CCGrid.2016.52","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.52","url":null,"abstract":"Many small and medium size research groups have limitations to execute their HPC workloads because of the need to buy, configure and maintain their own cluster or grid solutions. At the same time, some research groups have large infrastructures with low utilization levels due in part to the tools they offer to end users, that require that each end user configures complex and distributed environments. In this paper we present an effort between a private and public institution to offer scientific applications as a service taking advantage of an existing infrastructure to create a private IaaS using OpenStack, and offering scientific applications through a friendly user interface. This strategy facilitates that researchers can run their HPC workloads on a private cloud in a transparent way, hiding the complexities of distributed and scalable cloud environments. We show how this strategy may help to increase the utilization of infrastructures, how it allows end users to easily execute and share their applications through a SaaS marketplace, and how new applications can be configured and deployed using a PaaS platform.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114792345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Tyrex: Size-Based Resource Allocation in MapReduce Frameworks Tyrex: MapReduce框架中基于大小的资源分配

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.82

Bogdan Ghit, D. Epema

Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves near optimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.

许多大型数据分析基础设施用于各种各样的工作，从简短的交互式查询到可能需要数小时甚至数天才能完成的大型数据分析工作。因此，像MapReduce这样的数据处理框架的工作负载可能由具有大量处理需求的作业组成。对于这样的工作负载，短作业可能会遇到比大型作业大一个数量级的减速，而用户可能期望的减速与作业大小成正比。为了解决MapReduce框架中作业速度变化大的问题，我们设计了一个名为TYREX的调度系统，该系统的灵感来自于分布式服务器系统中众所周知的TAGS任务分配策略。特别是，TYREX对MapReduce框架的资源进行分区，允许在任何分区中运行的任何作业读取存储在任何机器上的数据，在分区中施加运行时限制，并在这些分区中以节省工作的方式连续执行部分作业，直到它们可以运行完成。我们开发了一个统计模型，用于动态设置运行时限制，以实现接近最佳的作业减速性能，并且我们在包含合成基准和实际基准的工作负载的集群系统上对TYREX进行了经验评估。我们发现，与先进的MapReduce调度器(如FIFO和FAIR)相比，TYREX减少了一半的作业速度可变性，同时保持了作业速度的中位数。此外，与FIFO相比，TYREX在第95百分位的作业速度降低了50%以上，与FAIR相比降低了20-40%。

{"title":"Tyrex: Size-Based Resource Allocation in MapReduce Frameworks","authors":"Bogdan Ghit, D. Epema","doi":"10.1109/CCGrid.2016.82","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.82","url":null,"abstract":"Many large-scale data analytics infrastructures are employed for a wide variety of jobs, ranging from short interactive queries to large data analysis jobs that may take hours or even days to complete. As a consequence, data-processing frameworks like MapReduce may have workloads consisting of jobs with heavy-tailed processing requirements. With such workloads, short jobs may experience slowdowns that are an order of magnitude larger than large jobs do, while the users may expect slowdowns that are more in proportion with the job sizes. To address this problem of large job slowdown variability in MapReduce frameworks, we design a scheduling system called TYREX that is inspired by the well-known TAGS task assignment policy in distributed-server systems. In particular, TYREX partitions the resources of a MapReduce framework, allowing any job running in any partition to read data stored on any machine, imposes runtime limits in the partitions, and successively executes parts of jobs in a work-conserving way in these partitions until they can run to completion. We develop a statistical model for dynamically setting the runtime limits that achieves near optimal job slowdown performance, and we empirically evaluate TYREX on a cluster system with workloads consisting of both synthetic and real-world benchmarks. We find that TYREX cuts in half the job slowdown variability while preserving the median job slowdown when compared to state-of-the-art MapReduce schedulers such as FIFO and FAIR. Furthermore, TYREX reduces the job slowdown at the 95th percentile by more than 50% when compared to FIFO and by 20-40% when compared to FAIR.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125044956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Landrush: Rethinking In-Situ Analysis for GPGPU Workflows Landrush:重新思考GPGPU工作流的原位分析

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.58

Anshuman Goswami, Yuan Tian, K. Schwan, F. Zheng, Jeffrey S. Young, M. Wolf, G. Eisenhauer, S. Klasky

In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing output data volumes and increasing costs of data movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate scientific simulations and online analysis performed on the scientific data generated by the simulations. However, the asynchronous nature of GPGPU programming models and the limited context-switching capabilities on the GPU pose challenges to co-locating the scientific simulation and analysis on the same GPU. This paper dives deeper into these challenges to understand how best to co-locate analysis with scientific simulations on the GPUs in HPC clusters. Specifically, our 'Landrush' approach to GPU sharing proposes a solution that utilizes idle cycles on the GPU to provide an improved time-to-answer, that is, the total time to run the scientific simulation and analysis of the generated data. Landrush is demonstrated with experimental results obtained from leadership high-end applications on ORNL's Titan supercomputer, which show that (i) GPU-based scientific simulations have varying degrees of idle cycles to afford useful analysis task co-location, and (ii) the inability to context switch on the GPU at instruction granularity can be overcome by careful control of the analysis kernel launches and software-controlled early completion of analysis kernel executions. Results show that Landrush is superior in terms of time-to-answer compared to serially running simulations followed by analysis or by relying on the GPU driver and hardwired thread dispatcher to run analysis concurrently on a single GPU.

随着超级计算向百亿亿级发展，输出数据量不断增长，数据移动成本不断增加，因此对科学模拟输出数据进行现场分析是必要的。随着gpu等硬件加速器在高端机器中变得越来越普遍，出现了对模拟生成的科学数据进行共同定位和在线分析的新机会。然而，GPGPU编程模型的异步特性和GPU上有限的上下文切换能力给在同一GPU上进行科学仿真和分析带来了挑战。本文将深入探讨这些挑战，以了解如何最好地将分析与HPC集群中gpu上的科学模拟一起定位。具体来说，我们的“Landrush”GPU共享方法提出了一种解决方案，该解决方案利用GPU上的空闲周期来提供改进的应答时间，即运行生成数据的科学模拟和分析的总时间。通过在ORNL的Titan超级计算机上的领先高端应用程序获得的实验结果证明了Landrush，这表明(i)基于GPU的科学模拟具有不同程度的空闲周期，以提供有用的分析任务协同定位，以及(ii)通过仔细控制分析内核启动和软件控制分析内核执行的早期完成，可以克服GPU在指令粒度上无法切换上下文的问题。结果表明，与连续运行仿真然后进行分析或依赖GPU驱动程序和硬连线线程调度程序在单个GPU上并发运行分析相比，Landrush在应答时间方面更优越。

{"title":"Landrush: Rethinking In-Situ Analysis for GPGPU Workflows","authors":"Anshuman Goswami, Yuan Tian, K. Schwan, F. Zheng, Jeffrey S. Young, M. Wolf, G. Eisenhauer, S. Klasky","doi":"10.1109/CCGrid.2016.58","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.58","url":null,"abstract":"In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing output data volumes and increasing costs of data movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate scientific simulations and online analysis performed on the scientific data generated by the simulations. However, the asynchronous nature of GPGPU programming models and the limited context-switching capabilities on the GPU pose challenges to co-locating the scientific simulation and analysis on the same GPU. This paper dives deeper into these challenges to understand how best to co-locate analysis with scientific simulations on the GPUs in HPC clusters. Specifically, our 'Landrush' approach to GPU sharing proposes a solution that utilizes idle cycles on the GPU to provide an improved time-to-answer, that is, the total time to run the scientific simulation and analysis of the generated data. Landrush is demonstrated with experimental results obtained from leadership high-end applications on ORNL's Titan supercomputer, which show that (i) GPU-based scientific simulations have varying degrees of idle cycles to afford useful analysis task co-location, and (ii) the inability to context switch on the GPU at instruction granularity can be overcome by careful control of the analysis kernel launches and software-controlled early completion of analysis kernel executions. Results show that Landrush is superior in terms of time-to-answer compared to serially running simulations followed by analysis or by relying on the GPU driver and hardwired thread dispatcher to run analysis concurrently on a single GPU.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

CVSS: A Cost-Efficient and QoS-Aware Video Streaming Using Cloud Services CVSS:使用云服务的具有成本效益和质量意识的视频流

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.49

Xiangbo Li, M. Salehi, M. Bayoumi, R. Buyya

Video streams, either in form of on-demand streaming or live streaming, usually have to be converted (i.e., transcoded) based on the characteristics of clients' devices (e.g., spatial resolution, network bandwidth, and supported formats). Transcoding is a computationally expensive and time-consuming operation, therefore, streaming service providers currently store numerous transcoded versions of the same video to serve different types of client devices. Due to the expense of maintaining and upgrading storage and computing infrastructures, many streaming service providers (e.g., Netflix) recently are becoming reliant on cloud services. However, the challenge in utilizing cloud services for video transcoding is how to deploy cloud resources in a cost-efficient manner without any major impact on the quality of video streams. To address this challenge, in this paper, we present the Cloud-based Video Streaming Service (CVSS) architecture to transcode video streams in an on-demand manner. The architecture provides a platform for streaming service providers to utilize cloud resources in a cost-efficient manner and with respect to the Quality of Service (QoS) demands of video streams. In particular, the architecture includes a QoS-aware scheduling method to efficiently map video streams to cloud resources, and a cost-aware dynamic (i.e., elastic) resource provisioning policy that adapts the resource acquisition with respect to the video streaming QoS demands. Simulation results based on realistic cloud traces and with various workload conditions, demonstrate that the CVSS architecture can satisfy video streaming QoS demands and reduces the incurred cost of stream providers up to 70%.

视频流，无论是点播流还是直播流，通常都必须根据客户端设备的特性(例如，空间分辨率、网络带宽和支持的格式)进行转换(即转码)。转码是一项计算成本高且耗时的操作，因此，流媒体服务提供商目前存储同一视频的多个转码版本，以服务于不同类型的客户端设备。由于维护和升级存储和计算基础设施的费用，许多流媒体服务提供商(例如Netflix)最近开始依赖云服务。然而，利用云服务进行视频转码的挑战是如何以经济有效的方式部署云资源，而不会对视频流的质量产生任何重大影响。为了应对这一挑战，在本文中，我们提出了基于云的视频流服务(CVSS)架构，以按需方式对视频流进行转码。该架构为流媒体服务提供商提供了一个平台，以一种经济有效的方式利用云资源，并考虑到视频流的服务质量(QoS)需求。特别是，该体系结构包括一个QoS感知的调度方法，以有效地将视频流映射到云资源，以及一个成本感知的动态(即弹性)资源供应策略，该策略根据视频流QoS需求调整资源获取。基于真实云轨迹和各种工作负载条件的仿真结果表明，CVSS架构可以满足视频流QoS需求，并将流提供商的成本降低高达70%。

{"title":"CVSS: A Cost-Efficient and QoS-Aware Video Streaming Using Cloud Services","authors":"Xiangbo Li, M. Salehi, M. Bayoumi, R. Buyya","doi":"10.1109/CCGrid.2016.49","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.49","url":null,"abstract":"Video streams, either in form of on-demand streaming or live streaming, usually have to be converted (i.e., transcoded) based on the characteristics of clients' devices (e.g., spatial resolution, network bandwidth, and supported formats). Transcoding is a computationally expensive and time-consuming operation, therefore, streaming service providers currently store numerous transcoded versions of the same video to serve different types of client devices. Due to the expense of maintaining and upgrading storage and computing infrastructures, many streaming service providers (e.g., Netflix) recently are becoming reliant on cloud services. However, the challenge in utilizing cloud services for video transcoding is how to deploy cloud resources in a cost-efficient manner without any major impact on the quality of video streams. To address this challenge, in this paper, we present the Cloud-based Video Streaming Service (CVSS) architecture to transcode video streams in an on-demand manner. The architecture provides a platform for streaming service providers to utilize cloud resources in a cost-efficient manner and with respect to the Quality of Service (QoS) demands of video streams. In particular, the architecture includes a QoS-aware scheduling method to efficiently map video streams to cloud resources, and a cost-aware dynamic (i.e., elastic) resource provisioning policy that adapts the resource acquisition with respect to the video streaming QoS demands. Simulation results based on realistic cloud traces and with various workload conditions, demonstrate that the CVSS architecture can satisfy video streaming QoS demands and reduces the incurred cost of stream providers up to 70%.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37