2021 IEEE 10th International Conference on Cloud Networking (CloudNet)最新文献

英文中文

Data Analytics Using Two-Stage Intelligent Model Pipelining for Virtual Network Functions 基于两阶段智能模型流水线的虚拟网络功能数据分析

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657133

T. Miyazawa, Ved P. Kafle, H. Asaeda

The use of machine learning (ML) technologies to predict server workloads and proactively adjust the amount of computational resource to maximize the quality of services is an enormous challenge. In this study, we introduce an ITU-T Y.3177 compliant framework for autonomous resource control and management of virtualized network infrastructures. Based on this framework, we propose (1) an architecture for a data analytics system consisting of learning and prediction components, and (2) a two-stage intelligent model pipelining mechanism for the learning component that cascades two ML models, namely nonlinear regression and multiple regression, to understand the trends of the fluctuations in CPU usage of a network node and predict the peak CPU usage of the node in the time granularity of seconds. We evaluated the proposed mechanism in an experimental network that installed in-network caching nodes as network functions. We prove that our ML models are capable of performing agile data analytics in the time granularity of seconds and can reduce the prediction errors of peak CPU usage.

使用机器学习(ML)技术来预测服务器工作负载并主动调整计算资源量以最大限度地提高服务质量是一项巨大的挑战。在本研究中，我们引入了一个符合ITU-T Y.3177的框架，用于虚拟化网络基础设施的自主资源控制和管理。基于该框架，我们提出了(1)由学习和预测组件组成的数据分析系统架构;(2)学习组件的两阶段智能模型流水线机制，通过级联非线性回归和多元回归两个ML模型，了解网络节点CPU使用的波动趋势，并以秒为时间粒度预测节点的CPU使用峰值。我们在一个实验网络中评估了所提出的机制，该网络安装了网络内缓存节点作为网络功能。我们证明了我们的机器学习模型能够在秒级的时间粒度内执行敏捷数据分析，并且可以减少峰值CPU使用的预测误差。

{"title":"Data Analytics Using Two-Stage Intelligent Model Pipelining for Virtual Network Functions","authors":"T. Miyazawa, Ved P. Kafle, H. Asaeda","doi":"10.1109/CloudNet53349.2021.9657133","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657133","url":null,"abstract":"The use of machine learning (ML) technologies to predict server workloads and proactively adjust the amount of computational resource to maximize the quality of services is an enormous challenge. In this study, we introduce an ITU-T Y.3177 compliant framework for autonomous resource control and management of virtualized network infrastructures. Based on this framework, we propose (1) an architecture for a data analytics system consisting of learning and prediction components, and (2) a two-stage intelligent model pipelining mechanism for the learning component that cascades two ML models, namely nonlinear regression and multiple regression, to understand the trends of the fluctuations in CPU usage of a network node and predict the peak CPU usage of the node in the time granularity of seconds. We evaluated the proposed mechanism in an experimental network that installed in-network caching nodes as network functions. We prove that our ML models are capable of performing agile data analytics in the time granularity of seconds and can reduce the prediction errors of peak CPU usage.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Machine Learning Approach for Service Function Chain Embedding in Cloud Datacenter Networks 云数据中心网络中业务功能链嵌入的机器学习方法

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657124

T. Wassing, D. D. Vleeschauwer, C. Papagianni

Network Functions Virtualization (NFV) is an industry effort to replace traditional hardware middleboxes with virtualized network functions (VNFs) running on general-build hardware platforms, enabling cost reduction, operational efficiency, and service agility. A Service Function Chain (SFC) constitutes an end-to-end network service, formed by chaining together VNFs in specific order. Infrastructure providers and cloud service providers try to optimally allocate computing and network resources to SFCs, in order to reduce costs and increase profit margins. The corresponding resource allocation problem, known as SFC embedding problem, is proven to be NP-hard.Traditionally the problem has been formulated as Mixed Integer Linear Program (MILP), assuming each SFC’s requirements are known a priori, while the embedding decision is based on a snapshot of the infrastructure’s load at request time. Reinforcement learning (RL) has been recently applied, showing promising results, specifically in dynamic environments, where such assumptions are considered unrealistic. However, standard RL techniques such as Q-learning might not be appropriate for addressing the problem at scale, as they are often ineffective for high-dimensional domains. On the other hand, Deep RL (DRL) algorithms can deal with high dimensional state spaces. In this paper, a Deep Q-Learning (DQL) approach is proposed to address the SFC resource allocation problem. The DQL agent utilizes a neural network for function approximation in Q-learning with experience replay learning. The simulations demonstrate that the new approach outperforms the linear programming approach. In addition, the DQL agent can perform SFC request admission control in real time.

网络功能虚拟化(Network Functions Virtualization, NFV)是业界的一项努力，旨在用运行在通用构建硬件平台上的虚拟化网络功能(virtual Network Functions, VNFs)取代传统硬件中间件，从而降低成本、提高运营效率和服务敏捷性。SFC (Service Function Chain)是一个端到端的网络服务，由VNFs按一定的顺序连接在一起形成。基础设施提供商和云服务提供商试图将计算和网络资源优化分配给sfc，以降低成本并提高利润率。相应的资源分配问题，称为SFC嵌入问题，被证明是np困难的。传统上，该问题被表述为混合整数线性规划(MILP)，假设每个SFC的需求是先验已知的，而嵌入决策是基于请求时基础设施负载的快照。强化学习(RL)最近得到了应用，显示出有希望的结果，特别是在动态环境中，这种假设被认为是不现实的。然而，标准的强化学习技术(如Q-learning)可能不适合大规模地解决问题，因为它们通常对高维领域无效。另一方面，深度强化学习(DRL)算法可以处理高维状态空间。本文提出了一种深度q -学习(DQL)方法来解决SFC资源分配问题。DQL代理利用神经网络在q学习和经验重放学习中进行函数逼近。仿真结果表明，该方法优于线性规划方法。此外，DQL代理可以实时执行SFC请求准入控制。

{"title":"A Machine Learning Approach for Service Function Chain Embedding in Cloud Datacenter Networks","authors":"T. Wassing, D. D. Vleeschauwer, C. Papagianni","doi":"10.1109/CloudNet53349.2021.9657124","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657124","url":null,"abstract":"Network Functions Virtualization (NFV) is an industry effort to replace traditional hardware middleboxes with virtualized network functions (VNFs) running on general-build hardware platforms, enabling cost reduction, operational efficiency, and service agility. A Service Function Chain (SFC) constitutes an end-to-end network service, formed by chaining together VNFs in specific order. Infrastructure providers and cloud service providers try to optimally allocate computing and network resources to SFCs, in order to reduce costs and increase profit margins. The corresponding resource allocation problem, known as SFC embedding problem, is proven to be NP-hard.Traditionally the problem has been formulated as Mixed Integer Linear Program (MILP), assuming each SFC’s requirements are known a priori, while the embedding decision is based on a snapshot of the infrastructure’s load at request time. Reinforcement learning (RL) has been recently applied, showing promising results, specifically in dynamic environments, where such assumptions are considered unrealistic. However, standard RL techniques such as Q-learning might not be appropriate for addressing the problem at scale, as they are often ineffective for high-dimensional domains. On the other hand, Deep RL (DRL) algorithms can deal with high dimensional state spaces. In this paper, a Deep Q-Learning (DQL) approach is proposed to address the SFC resource allocation problem. The DQL agent utilizes a neural network for function approximation in Q-learning with experience replay learning. The simulations demonstrate that the new approach outperforms the linear programming approach. In addition, the DQL agent can perform SFC request admission control in real time.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117029916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Characterizing network performance of single-node large-scale container deployments 描述单节点大规模容器部署的网络性能

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657138

Conrado Boeira, M. Neves, T. Ferreto, I. Haque

Cloud services have shifted from complex monolithic designs to hundreds of loosely coupled microservices over the last years. These microservices communicate via pre-defined APIs (e.g., RPC) and are usually implemented on top of containers. To make the microservices model profitable, cloud providers often co-locate them on a single (virtual) machine, thus achieving high server utilization. Despite being overlooked by previous work, the challenge of providing high-quality network connectivity to multiple containers running on the same host becomes crucial for the overall cloud service performance in this scenario. For that reason, this paper focuses on identifying the overheads and bottlenecks caused by the increasing number of concurrent containers running on a single node, particularly from a networking perspective. Through an extensive set of experiments, we show that the networking performance is mostly restricted by the CPU capacity (even for I/O intensive workloads), that containers can largely suffer from interference originated from packet processing, and that proper core scheduling policies can significantly improve connection throughput. Ultimately, our findings can help to pave the way towards more efficient large-scale microservice deployments.

在过去的几年里，云服务已经从复杂的单片设计转变为数百个松散耦合的微服务。这些微服务通过预定义的api(例如RPC)进行通信，并且通常在容器之上实现。为了使微服务模型有利可图，云提供商通常将它们共同定位在单个(虚拟)机器上，从而实现高服务器利用率。尽管以前的工作忽略了这一点，但在这种情况下，为运行在同一主机上的多个容器提供高质量网络连接的挑战对整体云服务性能至关重要。出于这个原因，本文着重于识别单个节点上运行的并发容器数量不断增加所造成的开销和瓶颈，特别是从网络的角度来看。通过大量的实验，我们发现网络性能主要受到CPU容量的限制(即使是I/O密集型工作负载)，容器可能很大程度上受到来自数据包处理的干扰，适当的核心调度策略可以显著提高连接吞吐量。最终，我们的发现有助于为更高效的大规模微服务部署铺平道路。

{"title":"Characterizing network performance of single-node large-scale container deployments","authors":"Conrado Boeira, M. Neves, T. Ferreto, I. Haque","doi":"10.1109/CloudNet53349.2021.9657138","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657138","url":null,"abstract":"Cloud services have shifted from complex monolithic designs to hundreds of loosely coupled microservices over the last years. These microservices communicate via pre-defined APIs (e.g., RPC) and are usually implemented on top of containers. To make the microservices model profitable, cloud providers often co-locate them on a single (virtual) machine, thus achieving high server utilization. Despite being overlooked by previous work, the challenge of providing high-quality network connectivity to multiple containers running on the same host becomes crucial for the overall cloud service performance in this scenario. For that reason, this paper focuses on identifying the overheads and bottlenecks caused by the increasing number of concurrent containers running on a single node, particularly from a networking perspective. Through an extensive set of experiments, we show that the networking performance is mostly restricted by the CPU capacity (even for I/O intensive workloads), that containers can largely suffer from interference originated from packet processing, and that proper core scheduling policies can significantly improve connection throughput. Ultimately, our findings can help to pave the way towards more efficient large-scale microservice deployments.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128029596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Longer Stay Less Priority: Flow Length Approximation Used In Information-Agnostic Traffic Scheduling In Data Center Networks 停留时间越长优先级越低:数据中心网络中基于信息不可知的流量调度的流量长度近似

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657148

M. S. Iqbal, Chien Chen

Numerous scheduling approaches have been proposed to improve user experiences in a data center network (DCN) by reducing flow completion time (FCT). Mimicking the shortest job first (SJF) has been proved to be the prominent way to improve FCT. To do so, some approaches require flow size or completion time information in advance, which is not possible in scenarios like HTTP chunk transfer or database query response. Some information-agnostic schemes require involving end-hosts for counting the number of bytes sent. We present Longer Stay Less Priority (LSLP), an information-agnostic flow scheduling scheme, like Multi-Level Feedback Queue (MLFQ) scheduler in operating systems, that aims to mimic SJF using P4 switches in a DCN. LSLP considers all the flows as short flows initially and assigns them to the highest priority queue, and flows get demoted to the lower priority queues over time. LSLP estimates the active time of a flow by leveraging the state-of-the-art P4 switch’s programmable nature. LSLP estimates the active time of a group of new flows that arrive during a time interval and assigns their packets to the highest priority. At the beginning of the next time interval, arriving packets of old flows are placed one priority lower except for those already in the lowest priority queue. Therefore, short flows can be completed in the few higher priority queues while long flows are demoted to lower priority queues. We have evaluated LSLP via a series of tests and shown that its performance is comparable to the existing scheduling schemes.

为了通过减少流完成时间(FCT)来改善数据中心网络(DCN)中的用户体验，已经提出了许多调度方法。模拟最短作业优先(SJF)已被证明是改进FCT的重要途径。为此，一些方法需要提前获得流大小或完成时间信息，这在HTTP块传输或数据库查询响应等场景中是不可能的。一些信息不可知的方案需要涉及终端主机来计算发送的字节数。我们提出了LSLP (Longer Stay Less Priority)，一种与信息无关的流调度方案，类似于操作系统中的多级反馈队列(MLFQ)调度程序，旨在使用DCN中的P4交换机模拟SJF。LSLP最初将所有流视为短流，并将其分配给最高优先级队列，并且随着时间的推移，流将降级到较低优先级队列。LSLP通过利用最先进的P4交换机的可编程特性来估计流的活动时间。LSLP估计在一个时间间隔内到达的一组新流的活动时间，并将它们的数据包分配给最高优先级。在下一个时间间隔的开始，除了那些已经在最低优先级队列中的数据包外，旧流的到达数据包的优先级会降低一个。因此，短流可以在为数不多的高优先级队列中完成，而长流则被降级到低优先级队列中。我们通过一系列测试对LSLP进行了评估，并表明其性能与现有调度方案相当。

{"title":"Longer Stay Less Priority: Flow Length Approximation Used In Information-Agnostic Traffic Scheduling In Data Center Networks","authors":"M. S. Iqbal, Chien Chen","doi":"10.1109/CloudNet53349.2021.9657148","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657148","url":null,"abstract":"Numerous scheduling approaches have been proposed to improve user experiences in a data center network (DCN) by reducing flow completion time (FCT). Mimicking the shortest job first (SJF) has been proved to be the prominent way to improve FCT. To do so, some approaches require flow size or completion time information in advance, which is not possible in scenarios like HTTP chunk transfer or database query response. Some information-agnostic schemes require involving end-hosts for counting the number of bytes sent. We present Longer Stay Less Priority (LSLP), an information-agnostic flow scheduling scheme, like Multi-Level Feedback Queue (MLFQ) scheduler in operating systems, that aims to mimic SJF using P4 switches in a DCN. LSLP considers all the flows as short flows initially and assigns them to the highest priority queue, and flows get demoted to the lower priority queues over time. LSLP estimates the active time of a flow by leveraging the state-of-the-art P4 switch’s programmable nature. LSLP estimates the active time of a group of new flows that arrive during a time interval and assigns their packets to the highest priority. At the beginning of the next time interval, arriving packets of old flows are placed one priority lower except for those already in the lowest priority queue. Therefore, short flows can be completed in the few higher priority queues while long flows are demoted to lower priority queues. We have evaluated LSLP via a series of tests and shown that its performance is comparable to the existing scheduling schemes.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134232918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TPC Chair Address TPC主席致辞

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/cloudnet53349.2021.9657147

引用次数: 0

GDSim: Benchmarking Geo-Distributed Data Center Schedulers GDSim:对地理分布式数据中心调度程序进行基准测试

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657143

Daniel S. F. Alves, K. Obraczka, A. Kabbani

As cloud providers scale up their data centers and distribute them around the world to meet demand, proposing new job schedulers that take into account data center geographical distribution have been receiving considerable attention from the data center management research and practitioner community. However, testing and benchmarking new schedulers for geo-distributed data centers is complicated by the lack of a common, easily extensible experimental platform. To address this gap, we propose GDSim, an open-source job scheduling simulation environment for geo-distributed data centers that aims at facilitating development, testing, and evaluation of new geo-distributed schedulers. We showcase GDSim by using it to reproduce experiments and results for recently proposed geodistributed job schedulers, as well as testing those schedulers under new conditions which can reveal trends that have not been previously uncovered.

随着云提供商扩展其数据中心并将其分布到世界各地以满足需求，提出考虑数据中心地理分布的新作业调度器已经受到数据中心管理研究和从业者社区的相当大的关注。然而，由于缺乏通用的、易于扩展的实验平台，对地理分布式数据中心的新调度器进行测试和基准测试变得非常复杂。为了解决这一差距，我们提出了GDSim，这是一个用于地理分布式数据中心的开源作业调度模拟环境，旨在促进新的地理分布式调度程序的开发、测试和评估。我们通过使用GDSim来重现最近提出的地理分布式作业调度器的实验和结果，以及在新条件下测试这些调度器来展示GDSim，这些调度器可以揭示以前未发现的趋势。

引用次数: 0

Using Distributed Tracing to Identify Inefficient Resources Composition in Cloud Applications 使用分布式跟踪识别云应用程序中低效的资源组合

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657140

Clément Cassé, Pascal Berthou, P. Owezarski, S. Josset

Cloud-Applications are the new industry standard way of designing Web-Applications. With Cloud Computing, Applications are usually designed as microservices, and developers can take advantage of thousands of such existing microservices, involving several hundred of cross-component communications on different physical resources.Microservices orchestration (as Kubernetes) is an automatic process, which manages each component lifecycle, and notably their allocation on the different resources of the cloud infrastructure. Whereas such automatic cloud technologies ease development and deployment, they nevertheless obscure debugging and performance analysis. In order to gain insight on the composition of services, distributed tracing recently emerged as a way to get the decomposition of the activity of each component within a cloud infrastructure. This paper aims at providing methodologies and tools (leveraging state-of-the-art tracing) for getting a wider view of application behaviours, especially focusing on application performance assessment.In this paper, we focus on using distributed traces and allocation information from microservices to model their dependencies as a hierarchical property graph. By applying graph rewriting operations, we managed to project and filter communications observed between microservices at higher abstraction layers like the machine nodes, the zones or regions. Finally, in this paper we propose an implementation of the model running on a microservices shopping application deployed on a Zonal Kubernetes cluster monitored by OpenTelemetry traces. We propose using the flow hierarchy metric on the graph model to pinpoint cycles that reveal inefficient resource composition inducing possible performance issues and economic waste.

云应用程序是设计web应用程序的新行业标准方式。使用云计算，应用程序通常被设计为微服务，开发人员可以利用数千个这样的现有微服务，涉及数百个不同物理资源上的跨组件通信。微服务编排(如Kubernetes)是一个自动化的过程，它管理每个组件的生命周期，特别是它们在云基础设施的不同资源上的分配。尽管这种自动云技术简化了开发和部署，但它们仍然模糊了调试和性能分析。为了深入了解服务的组合，分布式跟踪最近成为了一种获得云基础设施中每个组件的活动分解的方法。本文旨在提供方法和工具(利用最先进的跟踪)，以获得更广泛的应用程序行为视图，特别是关注应用程序性能评估。在本文中，我们着重于使用来自微服务的分布式跟踪和分配信息来将它们的依赖关系建模为分层属性图。通过应用图形重写操作，我们成功地在更高的抽象层(如机器节点、区域或区域)投射和过滤微服务之间观察到的通信。最后，在本文中，我们提出了一个模型的实现，该模型运行在部署在由OpenTelemetry跟踪监控的区域Kubernetes集群上的微服务购物应用程序上。我们建议在图模型上使用流层次度量来确定揭示低效资源构成的周期，从而导致可能的性能问题和经济浪费。

{"title":"Using Distributed Tracing to Identify Inefficient Resources Composition in Cloud Applications","authors":"Clément Cassé, Pascal Berthou, P. Owezarski, S. Josset","doi":"10.1109/CloudNet53349.2021.9657140","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657140","url":null,"abstract":"Cloud-Applications are the new industry standard way of designing Web-Applications. With Cloud Computing, Applications are usually designed as microservices, and developers can take advantage of thousands of such existing microservices, involving several hundred of cross-component communications on different physical resources.Microservices orchestration (as Kubernetes) is an automatic process, which manages each component lifecycle, and notably their allocation on the different resources of the cloud infrastructure. Whereas such automatic cloud technologies ease development and deployment, they nevertheless obscure debugging and performance analysis. In order to gain insight on the composition of services, distributed tracing recently emerged as a way to get the decomposition of the activity of each component within a cloud infrastructure. This paper aims at providing methodologies and tools (leveraging state-of-the-art tracing) for getting a wider view of application behaviours, especially focusing on application performance assessment.In this paper, we focus on using distributed traces and allocation information from microservices to model their dependencies as a hierarchical property graph. By applying graph rewriting operations, we managed to project and filter communications observed between microservices at higher abstraction layers like the machine nodes, the zones or regions. Finally, in this paper we propose an implementation of the model running on a microservices shopping application deployed on a Zonal Kubernetes cluster monitored by OpenTelemetry traces. We propose using the flow hierarchy metric on the graph model to pinpoint cycles that reveal inefficient resource composition inducing possible performance issues and economic waste.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Understanding and Leveraging Cluster Heterogeneity for Efficient Execution of Cloud Services 理解和利用集群异构以实现云服务的高效执行

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657128

S. Shukla, D. Ghosal, M. Farrens

Cloud warehouses are becoming increasingly heterogeneous by introducing different types of processors of varying speed and energy-efficiency. Developing an optimal strategy for distributing latency-critical service (LC-service) requests across multiple instances in a heterogeneous cluster is non-trivial. In this paper, we present a detailed analysis of the impact of cluster heterogeneity on the achieved server utilization and energy footprint to meet the required service-level latency bound (SLO) of LC-services. We develop cluster-level control plane strategies to address two forms of cluster heterogeneity - capacity and energy-efficiency. First, we propose Maximum-SLO-Guaranteed Capacity (MSG-Capacity) proportional load balancing for LC-Services to address the capacity heterogeneity and show that it can achieve higher utilization than naive performance-based heterogeneity awareness. Then, we present Efficient-First (E-First) heuristic-based Instance Scaling to address the efficiency heterogeneity. Finally, to address the bi-dimensional (capacity and energy-efficiency) heterogeneity, we superimpose the two approaches to propose Energy-efficient and MSG-Capacity (E2MC) based control-plane strategy that maximizes utilization while minimizing the energy footprint.

通过引入不同速度和能效的不同类型的处理器，云仓库正变得越来越异构。开发跨异构集群中的多个实例分发延迟关键服务(LC-service)请求的最佳策略并非易事。在本文中，我们详细分析了集群异构对实现的服务器利用率和能源足迹的影响，以满足lc服务所需的服务级别延迟界限(SLO)。我们开发了集群级控制平面策略来解决两种形式的集群异质性——容量和能源效率。首先，我们提出了LC-Services的最大慢速保证容量(MSG-Capacity)比例负载平衡，以解决容量异构问题，并表明它可以实现比单纯的基于性能的异构感知更高的利用率。然后，我们提出了基于效率优先(E-First)启发式的实例缩放来解决效率异质性。最后，为了解决双向(容量和能源效率)异质性，我们将两种方法叠加在一起，提出了基于能效和MSG-Capacity (E2MC)的控制平面策略，以最大化利用率，同时最小化能源足迹。

{"title":"Understanding and Leveraging Cluster Heterogeneity for Efficient Execution of Cloud Services","authors":"S. Shukla, D. Ghosal, M. Farrens","doi":"10.1109/CloudNet53349.2021.9657128","DOIUrl":"https://doi.org/10.1109/CloudNet53349.2021.9657128","url":null,"abstract":"Cloud warehouses are becoming increasingly heterogeneous by introducing different types of processors of varying speed and energy-efficiency. Developing an optimal strategy for distributing latency-critical service (LC-service) requests across multiple instances in a heterogeneous cluster is non-trivial. In this paper, we present a detailed analysis of the impact of cluster heterogeneity on the achieved server utilization and energy footprint to meet the required service-level latency bound (SLO) of LC-services. We develop cluster-level control plane strategies to address two forms of cluster heterogeneity - capacity and energy-efficiency. First, we propose Maximum-SLO-Guaranteed Capacity (MSG-Capacity) proportional load balancing for LC-Services to address the capacity heterogeneity and show that it can achieve higher utilization than naive performance-based heterogeneity awareness. Then, we present Efficient-First (E-First) heuristic-based Instance Scaling to address the efficiency heterogeneity. Finally, to address the bi-dimensional (capacity and energy-efficiency) heterogeneity, we superimpose the two approaches to propose Energy-efficient and MSG-Capacity (E2MC) based control-plane strategy that maximizes utilization while minimizing the energy footprint.","PeriodicalId":369247,"journal":{"name":"2021 IEEE 10th International Conference on Cloud Networking (CloudNet)","volume":"36 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130319480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Cloud for Holography and Augmented Reality 云全息和增强现实

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657125

Antonios Makris, Abderrahmane Boudi, M. Coppola, Luís Cordeiro, M. Corsini, Patrizio Dazzi, Ferran Diego Andilla, Yago González Rozas, Manos N. Kamarianakis, M. Pateraki, Thu Le Pham, Antonis I Protopsaltis, Aravindh Raman, Alessandro Romussi, Luis Rosa, Elena Spatafora, T. Taleb, T. Theodoropoulos, K. Tserpes, E. Zschau, U. Herzog

The paper introduces the CHARITY framework, a novel framework which aspires to leverage the benefits of intelligent, network continuum autonomous orchestration of cloud, edge, and network resources, to create a symbiotic relationship between low and high latency infrastructures. These infrastructures will facilitate the needs of emerging applications such as holographic events, virtual reality training, and mixed reality entertainment. The framework relies on different enablers and technologies related to cloud and edge for offering a suitable environment in order to deliver the promise of ubiquitous computing to the NextGen application clients. The paper discusses the main pillars that support the CHARITY vision, and provide a description of the planned use cases that are planned to demonstrate CHARITY capabilities.

本文介绍了CHARITY框架，这是一个新颖的框架，旨在利用云、边缘和网络资源的智能、网络连续自治编排的优势，在低延迟和高延迟基础设施之间创建共生关系。这些基础设施将促进全息事件、虚拟现实培训和混合现实娱乐等新兴应用的需求。该框架依赖于与云和边缘相关的不同支持因素和技术来提供合适的环境，以便向下一代应用程序客户端交付无处不在的计算的承诺。本文讨论了支持CHARITY远景的主要支柱，并提供了计划用例的描述，这些计划用例被用来演示CHARITY能力。

引用次数: 14

The Open Cloud Testbed (OCT): A Platform for Research into new Cloud Technologies 开放云测试平台(OCT):一个研究新云技术的平台

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

Pub Date : 2021-11-08 DOI: 10.1109/CloudNet53349.2021.9657109

M. Zink, D. Irwin, E. Cecchet, Hakan Saplakoglu, O. Krieger, Martin C. Herbordt, M. Daitzman, Peter Desnoyers, M. Leeser, Suranga Handagala

The NSF-funded Open Cloud Testbed (OCT) project is building and supporting a testbed for research and experimentation into new cloud platforms – the underlying software which provides cloud services to applications. Testbeds such as OCT are critical for enabling research into new cloud technologies – research that requires experiments which potentially change the operation of the cloud itself.This paper gives an overview of the Open Cloud Testbed, including an overview on the existing components OCT is based on and the description of new infrastructure and software extension. In addition, we present several use cases of OCT, including a description of FPGA-based research enabled by newly-deployed resources.

美国国家科学基金会资助的开放云测试平台(OCT)项目正在构建和支持一个测试平台，用于研究和实验新的云平台——为应用程序提供云服务的底层软件。像OCT这样的试验台对于研究新的云技术至关重要——研究需要可能改变云本身运行的实验。本文给出了开放云测试平台的概述，包括对OCT所基于的现有组件的概述以及对新基础架构和软件扩展的描述。此外，我们还介绍了OCT的几个用例，包括对新部署资源支持的基于fpga的研究的描述。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE 10th International Conference on Cloud Networking (CloudNet)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀