IEEE Cloud Computing最新文献_第7页

Stay at the Helm: secure Kubernetes deployments via graph generation and attack reconstruction 掌舵:通过图生成和攻击重建来保护Kubernetes部署

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00022

Agathe Blaise, Filippo Rebecchi

In recent years, there has been an explosion of attacks directed at microservice-based platforms – a trend that follows closely the massive shift of the digital industries towards these environments. Management and operation of container-based microservices is automation-heavy, leveraging on container orchestration engines such as Kubernetes (K8s). Helm is the package manager of choice for K8s and provides Charts, i.e., configuration files that define a programmatic model for application deployments. In this paper, we propose a novel methodology for extracting and evaluating the security model of Helm Charts. Our proposal extracts a topological graph of the Chart, whose nodes and edges are then characterised by security features. We carry out risk assessments that refer to the attack tactics of the MITRE ATT&CK framework. Furthermore, starting from these scores, we extract the riskiest attack paths. We adopt an experimental validation approach by analysing a dataset created from multiple publicly accessible Helm Chart repositories. Our methodology reveals that, in most cases, they have vulnerabilities that can be exploited through complex attack paths.

近年来，针对基于微服务的平台的攻击呈爆炸式增长，这一趋势与数字行业向这些环境的大规模转变密切相关。基于容器的微服务的管理和操作自动化程度很高，需要利用Kubernetes (k8)等容器编排引擎。Helm是k8首选的包管理器，它提供了图表，即为应用程序部署定义可编程模型的配置文件。在本文中，我们提出一种新的方法来提取和评估赫尔姆图的安全模型。我们的建议提取了图表的拓扑图，然后用安全特征来表征其节点和边。我们根据MITRE ATT&CK框架的攻击策略进行风险评估。此外，从这些分数开始，我们提取最危险的攻击路径。我们通过分析从多个可公开访问的Helm Chart存储库创建的数据集，采用实验验证方法。我们的方法显示，在大多数情况下，它们具有可以通过复杂攻击路径利用的漏洞。

{"title":"Stay at the Helm: secure Kubernetes deployments via graph generation and attack reconstruction","authors":"Agathe Blaise, Filippo Rebecchi","doi":"10.1109/CLOUD55607.2022.00022","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00022","url":null,"abstract":"In recent years, there has been an explosion of attacks directed at microservice-based platforms – a trend that follows closely the massive shift of the digital industries towards these environments. Management and operation of container-based microservices is automation-heavy, leveraging on container orchestration engines such as Kubernetes (K8s). Helm is the package manager of choice for K8s and provides Charts, i.e., configuration files that define a programmatic model for application deployments. In this paper, we propose a novel methodology for extracting and evaluating the security model of Helm Charts. Our proposal extracts a topological graph of the Chart, whose nodes and edges are then characterised by security features. We carry out risk assessments that refer to the attack tactics of the MITRE ATT&CK framework. Furthermore, starting from these scores, we extract the riskiest attack paths. We adopt an experimental validation approach by analysing a dataset created from multiple publicly accessible Helm Chart repositories. Our methodology reveals that, in most cases, they have vulnerabilities that can be exploited through complex attack paths.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"51 1","pages":"59-69"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85720362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Towards Practical Privacy-Preserving Solution for Outsourced Neural Network Inference 面向外包神经网络推理的实用隐私保护解决方案

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-06-06 DOI: 10.1109/CLOUD55607.2022.00059

Pinglan Liu, Wensheng Zhang

When neural network model and data are outsourced to a cloud server for inference, it is desired to preserve the privacy of the model/data as the involved parties (i.e., cloud server, and model/data providing clients) may not trust mutually. Solutions have been proposed based on multi-party computation, trusted execution environment (TEE) and leveled or fully homomorphic encryption (LHE or FHE), but they all have limitations that hamper practical application. We propose a new framework based on integration of LHE and TEE, which enables collaboration among mutually-untrusted three parties, while minimizing the involvement of resource-constrained TEE but fully utilizing the untrusted but resource-rich part of server. We also propose a generic and efficient LHE-based inference scheme, along with optimizations, as an important performance-determining component of the framework. We implemented and evaluated the proposed scheme on a moderate platform, and the evaluations show that, our proposed system is applicable and scalable to various settings, and it has better or comparable performance when compared with the state-of-the-art solutions which are more restrictive in applicability and scalability.

当神经网络模型和数据外包给云服务器进行推理时，需要保护模型/数据的隐私，因为涉及的各方(即云服务器和提供模型/数据的客户端)可能不相互信任。目前已经提出了基于多方计算、可信执行环境(TEE)和水平或完全同态加密(LHE或FHE)的解决方案，但它们都有局限性，阻碍了实际应用。我们提出了一个基于LHE和TEE集成的新框架，实现了互不信任的三方之间的协作，同时最大限度地减少了资源受限TEE的参与，同时充分利用了服务器中不可信但资源丰富的部分。我们还提出了一个通用的、高效的基于lhe的推理方案，以及优化方案，作为框架中重要的性能决定组件。我们在一个中等规模的平台上对所提出的方案进行了实施和评估，评估结果表明，所提出的方案适用于各种环境，具有可扩展性，与目前在适用性和可扩展性方面受到限制的解决方案相比，具有更好或相当的性能。

{"title":"Towards Practical Privacy-Preserving Solution for Outsourced Neural Network Inference","authors":"Pinglan Liu, Wensheng Zhang","doi":"10.1109/CLOUD55607.2022.00059","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00059","url":null,"abstract":"When neural network model and data are outsourced to a cloud server for inference, it is desired to preserve the privacy of the model/data as the involved parties (i.e., cloud server, and model/data providing clients) may not trust mutually. Solutions have been proposed based on multi-party computation, trusted execution environment (TEE) and leveled or fully homomorphic encryption (LHE or FHE), but they all have limitations that hamper practical application. We propose a new framework based on integration of LHE and TEE, which enables collaboration among mutually-untrusted three parties, while minimizing the involvement of resource-constrained TEE but fully utilizing the untrusted but resource-rich part of server. We also propose a generic and efficient LHE-based inference scheme, along with optimizations, as an important performance-determining component of the framework. We implemented and evaluated the proposed scheme on a moderate platform, and the evaluations show that, our proposed system is applicable and scalable to various settings, and it has better or comparable performance when compared with the state-of-the-art solutions which are more restrictive in applicability and scalability.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"12 1","pages":"357-362"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74149935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Continuum Approach for Collaborative Task Processing in UAV MEC Networks 无人机MEC网络协同任务处理的连续体方法

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-06-06 DOI: 10.1109/CLOUD55607.2022.00046

Lorson Blair, Carlos A. Varela, S. Patterson

Unmanned aerial vehicles (UAVs) are becoming a viable platform for sensing and estimation in a wide variety of applications including disaster response, search and rescue, and security monitoring. These sensing UAVs have limited battery and computational capabilities, and thus must offload their data so it can be processed to provide actionable intelligence. We consider a compute platform consisting of a limited number of highly-resourced UAVs that act as mobile edge computing (MEC) servers to process the workload on premises. We propose a novel distributed solution to the collaborative processing problem that adaptively positions the MEC UAVs in response to the changing workload that arises both from the sensing UAVs’ mobility and the task generation. Our solution consists of two key building blocks: (1) an efficient workload estimation process by which the UAVs estimate the task field—a continuous approximation of the number of tasks to be processed at each location in the airspace, and (2) a distributed optimization method by which the UAVs partition the task field so as to maximize the system throughput. We evaluate our proposed solution using realistic models of surveillance UAV mobility and show that our method achieves up to 28% improvement in throughput over a non-adaptive baseline approach.

无人驾驶飞行器(uav)正在成为各种应用中传感和估计的可行平台，包括灾害响应，搜索和救援以及安全监控。这些传感无人机的电池和计算能力有限，因此必须卸载数据，以便处理数据以提供可操作的情报。我们考虑了一个由有限数量的高资源无人机组成的计算平台，这些无人机充当移动边缘计算(MEC)服务器来处理本地工作负载。我们提出了一种新的分布式解决方案来解决协同处理问题，该解决方案可以自适应地定位MEC无人机，以响应由感知无人机的机动性和任务生成引起的工作量变化。我们的解决方案由两个关键构建块组成:(1)高效的工作量估计过程，无人机通过该过程估计任务域-连续逼近空域中每个位置要处理的任务数量;(2)分布式优化方法，无人机通过该方法划分任务域以最大化系统吞吐量。我们使用监视无人机机动性的现实模型评估了我们提出的解决方案，并表明我们的方法比非自适应基线方法实现了高达28%的吞吐量改进。

{"title":"A Continuum Approach for Collaborative Task Processing in UAV MEC Networks","authors":"Lorson Blair, Carlos A. Varela, S. Patterson","doi":"10.1109/CLOUD55607.2022.00046","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00046","url":null,"abstract":"Unmanned aerial vehicles (UAVs) are becoming a viable platform for sensing and estimation in a wide variety of applications including disaster response, search and rescue, and security monitoring. These sensing UAVs have limited battery and computational capabilities, and thus must offload their data so it can be processed to provide actionable intelligence. We consider a compute platform consisting of a limited number of highly-resourced UAVs that act as mobile edge computing (MEC) servers to process the workload on premises. We propose a novel distributed solution to the collaborative processing problem that adaptively positions the MEC UAVs in response to the changing workload that arises both from the sensing UAVs’ mobility and the task generation. Our solution consists of two key building blocks: (1) an efficient workload estimation process by which the UAVs estimate the task field—a continuous approximation of the number of tasks to be processed at each location in the airspace, and (2) a distributed optimization method by which the UAVs partition the task field so as to maximize the system throughput. We evaluate our proposed solution using realistic models of surveillance UAV mobility and show that our method achieves up to 28% improvement in throughput over a non-adaptive baseline approach.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"195 1","pages":"247-256"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74437291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness 将文档转换作为具有高吞吐量和响应能力的云服务交付

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-06-01 DOI: 10.1109/CLOUD55607.2022.00060

Christoph Auer, Michele Dolfi, A. Carvalho, Cesar Berrospi Ramis, P. W. J. S. I. Research, SoftINSA Lda.

Document understanding is a key business process in the data-driven economy since documents are central to knowledge discovery and business insights. Converting documents into a machine-processable format is a particular challenge here due to their huge variability in formats and complex structure. Accordingly, many algorithms and machine-learning methods emerged to solve particular tasks such as Optical Character Recognition (OCR), layout analysis, table-structure recovery, figure understanding, etc. We observe the adoption of such methods in document understanding solutions offered by all major cloud providers. Yet, publications outlining how such services are designed and optimized to scale in the cloud are scarce. In this paper, we focus on the case of document conversion to illustrate the particular challenges of scaling a complex data processing pipeline with a strong reliance on machine-learning methods on cloud infrastructure. Our key objective is to achieve high scalability and responsiveness for different workload profiles in a well-defined resource budget. We outline the requirements, design, and implementation choices of our document conversion service and reflect on the challenges we faced. Evidence for the scaling behavior and resource efficiency is provided for two alternative workload distribution strategies and deployment configurations. Our best-performing method achieves sustained throughput of over one million PDF pages per hour on 3072 CPU cores across 192 nodes.

文档理解是数据驱动经济中的一个关键业务流程，因为文档是知识发现和业务洞察的核心。将文档转换为机器可处理的格式在这里是一个特别的挑战，因为它们在格式和复杂的结构上有很大的可变性。因此，出现了许多算法和机器学习方法来解决特定的任务，如光学字符识别(OCR)、布局分析、表结构恢复、图形理解等。我们观察到所有主要云提供商提供的文档理解解决方案都采用了这些方法。然而，概述如何设计和优化这些服务以在云中扩展的出版物很少。在本文中，我们将重点关注文档转换的情况，以说明扩展复杂数据处理管道的特殊挑战，该管道强烈依赖于云基础设施上的机器学习方法。我们的主要目标是在定义良好的资源预算中实现不同工作负载配置文件的高可伸缩性和响应性。我们概述了文档转换服务的需求、设计和实现选择，并反映了我们面临的挑战。本文为两种可选的工作负载分布策略和部署配置提供了扩展行为和资源效率的证据。我们性能最好的方法在192个节点上的3072个CPU内核上实现了每小时超过100万PDF页面的持续吞吐量。

{"title":"Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness","authors":"Christoph Auer, Michele Dolfi, A. Carvalho, Cesar Berrospi Ramis, P. W. J. S. I. Research, SoftINSA Lda.","doi":"10.1109/CLOUD55607.2022.00060","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00060","url":null,"abstract":"Document understanding is a key business process in the data-driven economy since documents are central to knowledge discovery and business insights. Converting documents into a machine-processable format is a particular challenge here due to their huge variability in formats and complex structure. Accordingly, many algorithms and machine-learning methods emerged to solve particular tasks such as Optical Character Recognition (OCR), layout analysis, table-structure recovery, figure understanding, etc. We observe the adoption of such methods in document understanding solutions offered by all major cloud providers. Yet, publications outlining how such services are designed and optimized to scale in the cloud are scarce. In this paper, we focus on the case of document conversion to illustrate the particular challenges of scaling a complex data processing pipeline with a strong reliance on machine-learning methods on cloud infrastructure. Our key objective is to achieve high scalability and responsiveness for different workload profiles in a well-defined resource budget. We outline the requirements, design, and implementation choices of our document conversion service and reflect on the challenges we faced. Evidence for the scaling behavior and resource efficiency is provided for two alternative workload distribution strategies and deployment configurations. Our best-performing method achieves sustained throughput of over one million PDF pages per hour on 3072 CPU cores across 192 nodes.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"110 1","pages":"363-373"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87703561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems 异质边缘系统上机器学习任务的公平调度

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-05-31 DOI: 10.1109/CLOUD55607.2022.00069

Ali Mokhtari, Md. Abir Hossen, Pooyan Jamshidi, M. Salehi

Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGA) to fulfill the latency constraints of ML applications. The challenge is to allocate user requests for different ML applications on the Heterogeneous Edge Computing Systems (HEC) with respect to both the energy and latency constraints of these systems. To this end, we study and analyze resource allocation solutions that can increase the on-time task completion rate while considering the energy constraint. Importantly, we investigate edge-friendly (lightweight) multi-objective mapping heuristics that do not become biased toward a particular application type to achieve the objectives; instead, the heuristics consider "fairness" across the concurrent ML applications in their mapping decisions. Performance evaluations demonstrate that the proposed heuristic outperforms widely-used heuristics in heterogeneous systems in terms of the latency and energy objectives, particularly, at low to moderate request arrival rates. We observed 8.9% improvement in on-time task completion rate and 12.6% in energy-saving without imposing any significant overhead on the edge system.

边缘计算通过并发和持续执行对延迟敏感的机器学习(ML)应用程序来实现基于物联网的智能系统。这些基于边缘的机器学习系统通常是电池供电的(即能量有限)。它们使用具有不同计算性能的异构资源(例如，CPU, GPU和/或FPGA)来满足ML应用程序的延迟限制。面临的挑战是在异构边缘计算系统(HEC)上为不同的ML应用程序分配用户请求，同时考虑到这些系统的能量和延迟限制。为此，我们研究并分析了在考虑能量约束的情况下，能够提高任务准时完成率的资源分配方案。重要的是，我们研究了边缘友好(轻量级)多目标映射启发式，不会偏向于特定的应用类型来实现目标;相反，启发式算法在其映射决策中考虑了并发ML应用程序之间的“公平性”。性能评估表明，所提出的启发式方法在延迟和能量目标方面优于异构系统中广泛使用的启发式方法，特别是在低到中等请求到达率时。我们观察到，在不给边缘系统带来任何显著开销的情况下，准时任务完成率提高了8.9%，节能提高了12.6%。

{"title":"FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems","authors":"Ali Mokhtari, Md. Abir Hossen, Pooyan Jamshidi, M. Salehi","doi":"10.1109/CLOUD55607.2022.00069","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00069","url":null,"abstract":"Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGA) to fulfill the latency constraints of ML applications. The challenge is to allocate user requests for different ML applications on the Heterogeneous Edge Computing Systems (HEC) with respect to both the energy and latency constraints of these systems. To this end, we study and analyze resource allocation solutions that can increase the on-time task completion rate while considering the energy constraint. Importantly, we investigate edge-friendly (lightweight) multi-objective mapping heuristics that do not become biased toward a particular application type to achieve the objectives; instead, the heuristics consider \"fairness\" across the concurrent ML applications in their mapping decisions. Performance evaluations demonstrate that the proposed heuristic outperforms widely-used heuristics in heterogeneous systems in terms of the latency and energy objectives, particularly, at low to moderate request arrival rates. We observed 8.9% improvement in on-time task completion rate and 12.6% in energy-saving without imposing any significant overhead on the edge system.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"48 1","pages":"459-468"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78240473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards a Security Stress-Test for Cloud Configurations 面向云配置的安全压力测试

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-05-28 DOI: 10.1109/CLOUD55607.2022.00038

F. Minna, F. Massacci, Katja Tuma

Securing cloud configurations is an elusive task, which is left up to system administrators who have to base their decisions on "trial and error" experimentations or by observing good practices (e.g., CIS Benchmarks). We propose a knowledge, AND/OR, graphs approach to model cloud deployment security objects and vulnerabilities. In this way, we can capture relationships between configurations, permissions (e.g., CAP_SYS_ADMIN), and security profiles (e.g., AppArmor and SecComp). Such an approach allows us to suggest alternative and safer configurations, support administrators in the study of what-if scenarios, and scale the analysis to large scale deployments. We present an initial validation and illustrate the approach with three real vulnerabilities from known sources.

保护云配置是一项难以捉摸的任务，它留给系统管理员，他们必须根据“试错”实验或观察良好实践(例如，CIS基准测试)来做出决策。我们提出了一种知识、AND/OR图方法来建模云部署安全对象和漏洞。通过这种方式，我们可以捕获配置、权限(例如CAP_SYS_ADMIN)和安全配置文件(例如AppArmor和SecComp)之间的关系。这种方法允许我们建议可选的更安全的配置，支持管理员研究假设场景，并将分析扩展到大规模部署。我们提出了一个初步的验证，并用三个已知来源的真实漏洞说明了该方法。

引用次数: 3

MetaNet: Automated Dynamic Selection of Scheduling Policies in Cloud Environments MetaNet:云环境下调度策略的自动动态选择

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-05-21 DOI: 10.1109/CLOUD55607.2022.00056

Shreshth Tuli, G. Casale, N. Jennings

Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for cloud schedulers is the execution cost. In this regard, several data-driven deep neural networks (DNNs) based schedulers have been proposed in recent years to allow scalable and efficient resource management in dynamic workload settings. However, optimal scheduling frequently relies on sophisticated DNNs with high computational needs implying higher execution costs. Further, even in non-stationary environments, sophisticated schedulers might not always be required and we could briefly rely on low-cost schedulers in the interest of cost-efficiency. Therefore, this work aims to solve the non-trivial meta problem of online dynamic selection of a scheduling policy using a surrogate model called MetaNet. Unlike traditional solutions with a fixed scheduling policy, MetaNet on-the-fly chooses a scheduler from a large set of DNN based methods to optimize task scheduling and execution costs in tandem. Compared to state-of-the-art DNN schedulers, this allows for improvement in execution costs, energy consumption, response time and service level agreement violations by up to 11, 43, 8 and 13 percent, respectively.

任务调度是云计算环境下优化服务质量(QoS)的一个研究热点问题。为了维持快速增长的计算需求，云调度器最重要的QoS指标之一是执行成本。在这方面，近年来提出了几种基于数据驱动的深度神经网络(dnn)调度器，以允许在动态工作负载设置中进行可扩展和有效的资源管理。然而，最优调度往往依赖于复杂的深度神经网络，具有高计算需求，这意味着更高的执行成本。此外，即使在非固定环境中，也可能并不总是需要复杂的调度器，我们可以简单地依靠低成本的调度器来提高成本效率。因此，这项工作旨在解决在线动态选择调度策略的重要元问题，使用称为MetaNet的代理模型。与具有固定调度策略的传统解决方案不同，MetaNet实时从大量基于深度神经网络的方法中选择调度程序，以同步优化任务调度和执行成本。与最先进的DNN调度器相比，这允许在执行成本、能耗、响应时间和服务级别协议违反方面分别提高11%、43%、8%和13%。

{"title":"MetaNet: Automated Dynamic Selection of Scheduling Policies in Cloud Environments","authors":"Shreshth Tuli, G. Casale, N. Jennings","doi":"10.1109/CLOUD55607.2022.00056","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00056","url":null,"abstract":"Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for cloud schedulers is the execution cost. In this regard, several data-driven deep neural networks (DNNs) based schedulers have been proposed in recent years to allow scalable and efficient resource management in dynamic workload settings. However, optimal scheduling frequently relies on sophisticated DNNs with high computational needs implying higher execution costs. Further, even in non-stationary environments, sophisticated schedulers might not always be required and we could briefly rely on low-cost schedulers in the interest of cost-efficiency. Therefore, this work aims to solve the non-trivial meta problem of online dynamic selection of a scheduling policy using a surrogate model called MetaNet. Unlike traditional solutions with a fixed scheduling policy, MetaNet on-the-fly chooses a scheduler from a large set of DNN based methods to optimize task scheduling and execution costs in tandem. Compared to state-of-the-art DNN schedulers, this allows for improvement in execution costs, energy consumption, response time and service level agreement violations by up to 11, 43, 8 and 13 percent, respectively.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"1 1","pages":"331-341"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89139802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Search-based Methods for Multi-Cloud Configuration 基于搜索的多云配置方法

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-04-20 DOI: 10.1109/CLOUD55607.2022.00067

M. Lazuka, Thomas P. Parnell, Andreea Anghel, Haralambos Pozidis

Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lock-in. While most cloud providers offer similar functionality, they may differ significantly in terms of performance and/or cost. A customer looking to benefit from such differences will naturally want to solve the multi-cloud configuration problem: given a workload, which cloud provider should be chosen and how should its nodes be configured in order to minimize runtime or cost? In this work, we consider possible solutions to this multi-cloud optimization problem. We develop and evaluate possible adaptations of state-of-the-art cloud configuration solutions to the multi-cloud domain. Furthermore, we identify an analogy between multi-cloud configuration and the selection-configuration problems that are commonly studied in the automated machine learning (AutoML) field. Inspired by this connection, we utilize popular optimizers from AutoML to solve multi-cloud configuration. Finally, we propose a new algorithm for solving multi-cloud configuration, CloudBandit. It treats the outer problem of cloud provider selection as a best-arm identification problem, in which each arm pull corresponds to running an arbitrary black-box optimizer on the inner problem of node configuration. Our extensive experiments indicate that (a) many state-of-the-art cloud configuration solutions can be adapted to multi-cloud, with best results obtained for adaptations which utilize the hierarchical structure of the multi-cloud configuration domain, (b) hierarchical methods from AutoML can be used for the multi-cloud configuration task and can outperform state-of-the-art cloud configuration solutions and (c) CloudBandit achieves competitive or lower regret relative to other tested algorithms, whilst also identifying configurations that have 65% lower median cost and 20% lower median runtime in production, compared to choosing a random provider and configuration.

多云计算在希望避免供应商锁定的企业中变得越来越流行。虽然大多数云提供商提供类似的功能，但它们在性能和/或成本方面可能存在很大差异。希望从这些差异中受益的客户自然希望解决多云配置问题:给定一个工作负载，应该选择哪个云提供商，以及应该如何配置其节点，以最小化运行时间或成本?在这项工作中，我们考虑了这个多云优化问题的可能解决方案。我们开发和评估最先进的云配置解决方案对多云域的可能适应性。此外，我们确定了多云配置与自动机器学习(AutoML)领域中通常研究的选择配置问题之间的类比。受这种联系的启发，我们利用AutoML流行的优化器来解决多云配置。最后，我们提出了一种新的解决多云配置的算法——CloudBandit。它将云提供商选择的外部问题视为最佳臂识别问题，其中每个臂拉对应于在节点配置的内部问题上运行任意黑盒优化器。我们的大量实验表明:(a)许多最先进的云配置解决方案可以适应多云，利用多云配置域的分层结构的适应性获得最佳结果，(b) AutoML的分层方法可用于多云配置任务，并且可以优于最先进的云配置解决方案，以及(c) CloudBandit相对于其他测试算法实现竞争性或更低的遗憾。同时，与选择随机的供应商和配置相比，还可以确定在生产中成本中值降低65%，运行时间中值降低20%的配置。

{"title":"Search-based Methods for Multi-Cloud Configuration","authors":"M. Lazuka, Thomas P. Parnell, Andreea Anghel, Haralambos Pozidis","doi":"10.1109/CLOUD55607.2022.00067","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00067","url":null,"abstract":"Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lock-in. While most cloud providers offer similar functionality, they may differ significantly in terms of performance and/or cost. A customer looking to benefit from such differences will naturally want to solve the multi-cloud configuration problem: given a workload, which cloud provider should be chosen and how should its nodes be configured in order to minimize runtime or cost? In this work, we consider possible solutions to this multi-cloud optimization problem. We develop and evaluate possible adaptations of state-of-the-art cloud configuration solutions to the multi-cloud domain. Furthermore, we identify an analogy between multi-cloud configuration and the selection-configuration problems that are commonly studied in the automated machine learning (AutoML) field. Inspired by this connection, we utilize popular optimizers from AutoML to solve multi-cloud configuration. Finally, we propose a new algorithm for solving multi-cloud configuration, CloudBandit. It treats the outer problem of cloud provider selection as a best-arm identification problem, in which each arm pull corresponds to running an arbitrary black-box optimizer on the inner problem of node configuration. Our extensive experiments indicate that (a) many state-of-the-art cloud configuration solutions can be adapted to multi-cloud, with best results obtained for adaptations which utilize the hierarchical structure of the multi-cloud configuration domain, (b) hierarchical methods from AutoML can be used for the multi-cloud configuration task and can outperform state-of-the-art cloud configuration solutions and (c) CloudBandit achieves competitive or lower regret relative to other tested algorithms, whilst also identifying configurations that have 65% lower median cost and 20% lower median runtime in production, compared to choosing a random provider and configuration.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"29 1","pages":"438-448"},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81520982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Efficient Approach to Move Elements in a Distributed Geo-Replicated Tree 分布式地理复制树中元素移动的一种有效方法

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-03-19 DOI: 10.1109/CLOUD55607.2022.00071

Parwat Singh Anjana, Adithya Rajesh Chandrassery, Sathya Peri

Replicated tree data structures are extensively used in collaborative applications and distributed file systems, where clients often perform move operations. Local move operations at different replicas may be safe. However, remote move operations may not be safe. When clients perform arbitrary move operations concurrently on different replicas, it could result in various bugs, making this operation challenging to implement. Previous work has revealed bugs such as data duplication and cycling in replicated trees. In this paper, we present an efficient algorithm to perform move operations on the distributed replicated tree while ensuring eventual consistency. The proposed technique is primarily concerned with resolving conflicts efficiently, requires no interaction between replicas, and works well with network partitions. We use the last write win semantics for conflict resolution based on globally unique timestamps of operations. The proposed solution requires only one compensation operation to avoid cycles being formed when move operations are applied. The proposed approach achieves an effective speedup of 14.6× to 68.19× over the state-of-the-art approach in a geo-replicated setting.

复制树数据结构广泛用于协作应用程序和分布式文件系统，其中客户端经常执行移动操作。不同副本上的本地移动操作可能是安全的。然而，远程移动操作可能并不安全。当客户端在不同副本上并发地执行任意移动操作时，可能会导致各种错误，从而使该操作难以实现。之前的工作已经揭示了数据复制和复制树中的循环等错误。在本文中，我们提出了一种有效的算法，在保证最终一致性的情况下对分布式复制树进行移动操作。所提出的技术主要关注有效地解决冲突，不需要副本之间的交互，并且可以很好地用于网络分区。我们使用基于全局唯一操作时间戳的最后写入获胜语义来解决冲突。所提出的解决方案只需要一个补偿操作，以避免在应用移动操作时形成循环。在地理复制环境中，与最先进的方法相比，建议的方法实现了14.6到68.19倍的有效加速。

{"title":"An Efficient Approach to Move Elements in a Distributed Geo-Replicated Tree","authors":"Parwat Singh Anjana, Adithya Rajesh Chandrassery, Sathya Peri","doi":"10.1109/CLOUD55607.2022.00071","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00071","url":null,"abstract":"Replicated tree data structures are extensively used in collaborative applications and distributed file systems, where clients often perform move operations. Local move operations at different replicas may be safe. However, remote move operations may not be safe. When clients perform arbitrary move operations concurrently on different replicas, it could result in various bugs, making this operation challenging to implement. Previous work has revealed bugs such as data duplication and cycling in replicated trees. In this paper, we present an efficient algorithm to perform move operations on the distributed replicated tree while ensuring eventual consistency. The proposed technique is primarily concerned with resolving conflicts efficiently, requires no interaction between replicas, and works well with network partitions. We use the last write win semantics for conflict resolution based on globally unique timestamps of operations. The proposed solution requires only one compensation operation to avoid cycles being formed when move operations are applied. The proposed approach achieves an effective speedup of 14.6× to 68.19× over the state-of-the-art approach in a geo-replicated setting.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"55 1","pages":"479-488"},"PeriodicalIF":0.0,"publicationDate":"2022-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75335080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cloud Computing: 11th EAI International Conference, CloudComp 2021, Virtual Event, December 9–10, 2021, Proceedings 云计算:第11届EAI国际会议，CloudComp 2021，虚拟事件，2021年12月9日至10日，论文集

Q1 Computer Science

IEEE Cloud Computing

Pub Date : 2022-01-01 DOI: 10.1007/978-3-030-99191-3

引用次数: 0