首页 > 最新文献

IEEE Cloud Computing最新文献

英文 中文
Resource Scaling Strategies for Open-Source FaaS Platforms compared to Commercial Cloud Offerings 与商业云产品相比,开源FaaS平台的资源扩展策略
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00020
Johannes Manner, G. Wirtz
Open-source offerings are often investigated when comparing their features to commercial cloud offerings. However, performance benchmarking is rarely executed for open-source tools hosted on-premise nor is it possible to conduct a fair cost comparison due to a lack of resource settings equivalent to cloud scaling strategies.Therefore, we firstly list implemented resource scaling strategies for public and open-source FaaS platforms. Based on this we propose a methodology to calculate an abstract performance measure to compare two platforms with each other. Since all open-source platforms suggest a Kubernetes deployment, we use this measure for a configuration of open-source FaaS platforms based on Kubernetes limits. We tested our approach with CPU intensive functions, considering the difference between single-threaded and multi-threaded functions to avoid wasting resources. With regard to this, we also address the noisy neighbor problem for open-source FaaS platforms by conducting an instance parallelization experiment. Our approach to limit resources leads to consistent results while avoiding an overbooking of resources.
在将开源产品的功能与商业云产品进行比较时,经常会对其进行调查。然而,很少对本地托管的开源工具执行性能基准测试,也不可能进行公平的成本比较,因为缺乏相当于云扩展策略的资源设置。因此,我们首先列出了公共和开源FaaS平台实现的资源扩展策略。在此基础上,我们提出了一种方法来计算一个抽象的性能指标来比较两个平台。因为所有的开源平台都建议部署Kubernetes,所以我们使用这个度量来配置基于Kubernetes限制的开源FaaS平台。我们用CPU密集型函数测试了我们的方法,考虑了单线程和多线程函数之间的差异,以避免浪费资源。为此,我们还通过实例并行化实验解决了开源FaaS平台的噪声邻居问题。我们限制资源的方法导致了一致的结果,同时避免了资源的超额预订。
{"title":"Resource Scaling Strategies for Open-Source FaaS Platforms compared to Commercial Cloud Offerings","authors":"Johannes Manner, G. Wirtz","doi":"10.1109/CLOUD55607.2022.00020","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00020","url":null,"abstract":"Open-source offerings are often investigated when comparing their features to commercial cloud offerings. However, performance benchmarking is rarely executed for open-source tools hosted on-premise nor is it possible to conduct a fair cost comparison due to a lack of resource settings equivalent to cloud scaling strategies.Therefore, we firstly list implemented resource scaling strategies for public and open-source FaaS platforms. Based on this we propose a methodology to calculate an abstract performance measure to compare two platforms with each other. Since all open-source platforms suggest a Kubernetes deployment, we use this measure for a configuration of open-source FaaS platforms based on Kubernetes limits. We tested our approach with CPU intensive functions, considering the difference between single-threaded and multi-threaded functions to avoid wasting resources. With regard to this, we also address the noisy neighbor problem for open-source FaaS platforms by conducting an instance parallelization experiment. Our approach to limit resources leads to consistent results while avoiding an overbooking of resources.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"17 1","pages":"40-48"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87816793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SAPPARCHI: an Osmotic Platform to Execute Scalable Applications on Smart City Environments SAPPARCHI:在智慧城市环境中执行可扩展应用程序的渗透平台
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00051
Arthur Souza, N. Cacho, T. Batista, R. Ranjan
In the Smart Cities context, a plethora of Middle-ware Platforms had been proposed to support applications execution and data processing. Despite all the progress already made, the vast majority of solutions have not met the requirements of Applications’ Runtime, Development, and Deployment when related to Scalability. Some studies point out that just 1 of 97 (1%) reported platforms reach this all this set of requirements at same time. This small number of platforms may be explained by some reasons: i) Big Data: The huge amount of processed and stored data with various data sources and data types, ii) Multi-domains: many domains involved (Economy, Traffic, Health, Security, Agronomy, etc.), iii) Multiple processing methods like Data Flow, Batch Processing, Services, and Microservices, and 4) High Distributed Degree: The use of multiple IoT and BigData tools combined with execution at various computational levels (Edge, Fog, Cloud) leads applications to present a high level of distribution. Aware of those great challenges, we propose Sapparchi, an integrated architectural model for Smart Cities applications that defines multi-processing levels (Edge, Fog, and Cloud). Also, it presents the Sapparchi middleware platform for developing, deploying, and running applications in the smart city environment with an osmotic multi-processing approach that scales applications from Cloud to Edge. Finally, an experimental evaluation exposes the main advantages of adopting Sapparchi.
在智慧城市环境中,已经提出了大量的中间件平台来支持应用程序执行和数据处理。尽管已经取得了所有的进展,但是当涉及到可伸缩性时,绝大多数解决方案都没有满足应用程序运行时、开发和部署的要求。一些研究指出,97个平台中只有1个(1%)同时达到了所有这些要求。平台数量少可能有以下几个原因:1)大数据:处理和存储的数据量巨大,数据源和数据类型多样;2)多领域:涉及多个领域(经济、交通、健康、安全、农学等);3)数据流、批处理、服务、微服务等多种处理方式;4)分布式程度高;使用多种物联网和大数据工具,并结合在不同计算级别(边缘,雾,云)执行,使应用程序呈现出高水平的分布。意识到这些巨大的挑战,我们提出了Sapparchi,这是一个智能城市应用的集成架构模型,定义了多处理级别(边缘、雾和云)。此外,它还提供了用于在智慧城市环境中开发、部署和运行应用程序的Sapparchi中间件平台,该平台采用渗透式多处理方法,可将应用程序从云扩展到边缘。最后,通过实验评价,揭示了采用Sapparchi的主要优点。
{"title":"SAPPARCHI: an Osmotic Platform to Execute Scalable Applications on Smart City Environments","authors":"Arthur Souza, N. Cacho, T. Batista, R. Ranjan","doi":"10.1109/CLOUD55607.2022.00051","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00051","url":null,"abstract":"In the Smart Cities context, a plethora of Middle-ware Platforms had been proposed to support applications execution and data processing. Despite all the progress already made, the vast majority of solutions have not met the requirements of Applications’ Runtime, Development, and Deployment when related to Scalability. Some studies point out that just 1 of 97 (1%) reported platforms reach this all this set of requirements at same time. This small number of platforms may be explained by some reasons: i) Big Data: The huge amount of processed and stored data with various data sources and data types, ii) Multi-domains: many domains involved (Economy, Traffic, Health, Security, Agronomy, etc.), iii) Multiple processing methods like Data Flow, Batch Processing, Services, and Microservices, and 4) High Distributed Degree: The use of multiple IoT and BigData tools combined with execution at various computational levels (Edge, Fog, Cloud) leads applications to present a high level of distribution. Aware of those great challenges, we propose Sapparchi, an integrated architectural model for Smart Cities applications that defines multi-processing levels (Edge, Fog, and Cloud). Also, it presents the Sapparchi middleware platform for developing, deploying, and running applications in the smart city environment with an osmotic multi-processing approach that scales applications from Cloud to Edge. Finally, an experimental evaluation exposes the main advantages of adopting Sapparchi.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"70 1","pages":"289-298"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76538206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A State-aware Method for Flows with Fairness on NVMe SSDs with Load Balance 具有负载平衡的NVMe ssd上具有公平性流的状态感知方法
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00017
Chin-Hsien Wu, Liang-Ting Chen
Nowadays, solid-state drives (SSDs) have become the best choice of storage devices, when compared with hard-disk drives (HDDs). More and more scenarios adopt a multi-SSD architecture to improve performance and expand storage capacity for cloud services, database centers, distributed systems and virtualized environments. When multiple users (flows) are competing for shared multiple SSDs concurrently, if the multi-SSD architecture lacks a fairness strategy among multiple users, a user that takes up more resources can affect other users. Meanwhile, if the multi-SSD architecture lacks a load-balance strategy among multiple shared SSDs, some specific SSDs may receive too many I/O requests to degrade the performance and shorten the lifespan. Therefore, we will propose a state-aware method to consider flows with fairness on NVMe SSDs with load balance.
目前,固态硬盘(ssd)已经成为存储设备的最佳选择,而不是硬盘驱动器(hdd)。在云服务、数据库中心、分布式系统和虚拟化环境中,越来越多的场景采用多ssd架构来提升性能和扩展存储容量。当多个用户(流)并发竞争共享的多个ssd时,如果多ssd架构缺乏多用户间的公平策略,则某个用户占用资源较多会影响其他用户。同时,如果多ssd架构缺少多个共享ssd之间的负载均衡策略,可能会导致某些特定ssd收到过多的I/O请求,从而降低性能,缩短寿命。因此,我们将提出一种状态感知方法来考虑具有负载平衡的NVMe ssd上具有公平性的流。
{"title":"A State-aware Method for Flows with Fairness on NVMe SSDs with Load Balance","authors":"Chin-Hsien Wu, Liang-Ting Chen","doi":"10.1109/CLOUD55607.2022.00017","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00017","url":null,"abstract":"Nowadays, solid-state drives (SSDs) have become the best choice of storage devices, when compared with hard-disk drives (HDDs). More and more scenarios adopt a multi-SSD architecture to improve performance and expand storage capacity for cloud services, database centers, distributed systems and virtualized environments. When multiple users (flows) are competing for shared multiple SSDs concurrently, if the multi-SSD architecture lacks a fairness strategy among multiple users, a user that takes up more resources can affect other users. Meanwhile, if the multi-SSD architecture lacks a load-balance strategy among multiple shared SSDs, some specific SSDs may receive too many I/O requests to degrade the performance and shorten the lifespan. Therefore, we will propose a state-aware method to consider flows with fairness on NVMe SSDs with load balance.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"91 1","pages":"11-18"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83495168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Localizing and Explaining Faults in Microservices Using Distributed Tracing 基于分布式跟踪的微服务故障定位与解释
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00072
Jesus Rios, Saurabh Jha, L. Shwartz
Finding the exact location of a fault in a large distributed microservices application running in containerized cloud environments can be very difficult and time-consuming. We present a novel approach that uses distributed tracing to automatically detect, localize and aid in explaining application-level faults. We demonstrate the effectiveness of our proposed approach by injecting faults into a well-known microservice-based benchmark application. Our experiments demonstrated that the proposed fault localization algorithm correctly detects and localize the microservice with the injected fault. We also compare our approach with other fault localization methods. In particular, we empirically show that our method outperforms methods in which a graph model of error propagation is used for inferring fault locations using error logs. Our work illustrates the value added by distributed tracing for localizing and explaining faults in microservices.
在容器化云环境中运行的大型分布式微服务应用程序中,查找故障的确切位置可能非常困难且耗时。我们提出了一种使用分布式跟踪来自动检测、定位和帮助解释应用程序级故障的新方法。我们通过将故障注入到一个知名的基于微服务的基准应用程序中来证明我们所提出方法的有效性。实验结果表明,所提出的故障定位算法能够正确地检测和定位注入故障的微服务。并与其它故障定位方法进行了比较。特别是,我们的经验表明,我们的方法优于使用错误传播的图模型来使用错误日志推断故障位置的方法。我们的工作说明了分布式跟踪对微服务中的本地化和故障解释所带来的价值。
{"title":"Localizing and Explaining Faults in Microservices Using Distributed Tracing","authors":"Jesus Rios, Saurabh Jha, L. Shwartz","doi":"10.1109/CLOUD55607.2022.00072","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00072","url":null,"abstract":"Finding the exact location of a fault in a large distributed microservices application running in containerized cloud environments can be very difficult and time-consuming. We present a novel approach that uses distributed tracing to automatically detect, localize and aid in explaining application-level faults. We demonstrate the effectiveness of our proposed approach by injecting faults into a well-known microservice-based benchmark application. Our experiments demonstrated that the proposed fault localization algorithm correctly detects and localize the microservice with the injected fault. We also compare our approach with other fault localization methods. In particular, we empirically show that our method outperforms methods in which a graph model of error propagation is used for inferring fault locations using error logs. Our work illustrates the value added by distributed tracing for localizing and explaining faults in microservices.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"60 1","pages":"489-499"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91279540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Study of Contributing Factors to Power Aware Vertical Scaling of Deadline Constrained Applications 期限约束应用中功率感知垂直扩展的影响因素研究
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00073
Pradyumna Kaushik, S. Raghavendra, M. Govindaraju
The adoption of virtualization technologies in datacenters has increased dramatically in the past decade. Clouds have pivoted from being just an infrastructure rental to offering platforms and solutions, made possible by having several layers of abstraction, providing internal and external users the ability to focus on core business logic. Efficient resource management has in turn become salient in ensuring operational efficiency. In this work, we study key factors that can influence vertical scaling decisions, propose a policy to vertically scale deadline constrained applications and surface our findings from experimentation. We observe that (a) the duration for which an application is profiled has an almost cyclic influence on the accuracy of behavior predictions and is inversely proportional to the time spent consuming backlog, (b) the duration for which an application is scaled can help achieve up to a 9.6% and 4.2% reduction in the 75th and 95th percentile of power usage respectively, (c) reducing the tolerance towards accrual of backlog influences the application execution time and can reduce the number of SLA violations by 50% or 100% at times and (d) increasing the time to deadline offers power saving opportunities and can help achieve a 9.3% improvement in the 75th percentile of power usage.
在过去十年中,数据中心对虚拟化技术的采用急剧增加。云已经从仅仅租用基础设施转变为提供平台和解决方案,这可以通过具有多个抽象层来实现,从而为内部和外部用户提供专注于核心业务逻辑的能力。有效的资源管理反过来又成为确保业务效率的突出问题。在这项工作中,我们研究了可能影响垂直扩展决策的关键因素,提出了垂直扩展受截止日期限制的应用程序的策略,并从实验中揭示了我们的发现。我们观察到(a)应用程序分析的持续时间对行为预测的准确性具有几乎周期性的影响,并且与消耗积压的时间成反比,(b)应用程序扩展的持续时间可以帮助实现高达9.6%和4.2%的减少,分别在第75和第95百分位的电力使用。(c)减少对积压累积的容忍度会影响应用程序的执行时间,有时可以将SLA违规数量减少50%或100%;(d)增加截止日期前的时间提供了节省电力的机会,并有助于在第75个百分点的电力使用中实现9.3%的改进。
{"title":"A Study of Contributing Factors to Power Aware Vertical Scaling of Deadline Constrained Applications","authors":"Pradyumna Kaushik, S. Raghavendra, M. Govindaraju","doi":"10.1109/CLOUD55607.2022.00073","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00073","url":null,"abstract":"The adoption of virtualization technologies in datacenters has increased dramatically in the past decade. Clouds have pivoted from being just an infrastructure rental to offering platforms and solutions, made possible by having several layers of abstraction, providing internal and external users the ability to focus on core business logic. Efficient resource management has in turn become salient in ensuring operational efficiency. In this work, we study key factors that can influence vertical scaling decisions, propose a policy to vertically scale deadline constrained applications and surface our findings from experimentation. We observe that (a) the duration for which an application is profiled has an almost cyclic influence on the accuracy of behavior predictions and is inversely proportional to the time spent consuming backlog, (b) the duration for which an application is scaled can help achieve up to a 9.6% and 4.2% reduction in the 75th and 95th percentile of power usage respectively, (c) reducing the tolerance towards accrual of backlog influences the application execution time and can reduce the number of SLA violations by 50% or 100% at times and (d) increasing the time to deadline offers power saving opportunities and can help achieve a 9.3% improvement in the 75th percentile of power usage.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"125 1","pages":"500-510"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87514094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detecting Layered Bottlenecks in Microservices 检测微服务中的分层瓶颈
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00062
T. Inagaki, Yohei Ueda, Moriyoshi Ohara, Sunyanan Choochotkaew, Marcelo Amaral, Scott Trent, Tatsuhiro Chiba, Qi Zhang
We propose a method to detect both software and hardware bottlenecks in a web service consisting of microservices. A bottleneck is a resource that limits the maximum performance of the entire web service. Bottlenecks often include both software resources such as threads, locks, and channels, and hardware resources such as processors, memories, and disks. Bottlenecks form a layered structure since a single request can utilize multiple software resources and a hardware resource simultaneously. The microservice architecture makes the detection of layered bottlenecks challenging due to the lack of a uniform analysis perspective across languages, libraries, frameworks, and middle-ware.We detect layered bottlenecks in microservices by profiling numbers and status of working threads in each microservice and dependency among microservices via network connections. Our approach can be applied to various programming languages since it relies only on standard debugging tools. Nevertheless, our approach not only detects which microservice is a bottleneck but also enables us to understand why it becomes a bottleneck. This is enabled by a novel visualization method to show layered bottlenecks in microservices at a glance. We demonstrate that our approach successfully detects and visualizes layered bottlenecks in the state-of-the-art microservice benchmarks, DeathStarBench and Acme Air microservices. This enables us to optimize the microservices themselves to achieve a higher throughput per re-source utilization rate compared with simply scaling the number of replicas of microservices.
我们提出了一种方法来检测由微服务组成的web服务中的软件和硬件瓶颈。瓶颈是限制整个web服务的最大性能的资源。瓶颈通常既包括线程、锁和通道等软件资源,也包括处理器、内存和磁盘等硬件资源。瓶颈形成分层结构,因为单个请求可以同时利用多个软件资源和一个硬件资源。由于缺乏跨语言、库、框架和中间件的统一分析视角,微服务体系结构使得分层瓶颈的检测具有挑战性。我们通过分析每个微服务中工作线程的数量和状态以及微服务之间通过网络连接的依赖关系来检测微服务中的分层瓶颈。我们的方法可以应用于各种编程语言,因为它只依赖于标准的调试工具。尽管如此,我们的方法不仅可以检测到哪个微服务是瓶颈,还可以让我们理解为什么它会成为瓶颈。这是通过一种新颖的可视化方法实现的,该方法可以一目了然地显示微服务中的分层瓶颈。我们证明了我们的方法在最先进的微服务基准、DeathStarBench和Acme Air微服务中成功地检测和可视化分层瓶颈。这使我们能够优化微服务本身,以实现更高的吞吐量/资源利用率,而不是简单地扩展微服务的副本数量。
{"title":"Detecting Layered Bottlenecks in Microservices","authors":"T. Inagaki, Yohei Ueda, Moriyoshi Ohara, Sunyanan Choochotkaew, Marcelo Amaral, Scott Trent, Tatsuhiro Chiba, Qi Zhang","doi":"10.1109/CLOUD55607.2022.00062","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00062","url":null,"abstract":"We propose a method to detect both software and hardware bottlenecks in a web service consisting of microservices. A bottleneck is a resource that limits the maximum performance of the entire web service. Bottlenecks often include both software resources such as threads, locks, and channels, and hardware resources such as processors, memories, and disks. Bottlenecks form a layered structure since a single request can utilize multiple software resources and a hardware resource simultaneously. The microservice architecture makes the detection of layered bottlenecks challenging due to the lack of a uniform analysis perspective across languages, libraries, frameworks, and middle-ware.We detect layered bottlenecks in microservices by profiling numbers and status of working threads in each microservice and dependency among microservices via network connections. Our approach can be applied to various programming languages since it relies only on standard debugging tools. Nevertheless, our approach not only detects which microservice is a bottleneck but also enables us to understand why it becomes a bottleneck. This is enabled by a novel visualization method to show layered bottlenecks in microservices at a glance. We demonstrate that our approach successfully detects and visualizes layered bottlenecks in the state-of-the-art microservice benchmarks, DeathStarBench and Acme Air microservices. This enables us to optimize the microservices themselves to achieve a higher throughput per re-source utilization rate compared with simply scaling the number of replicas of microservices.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"44 1","pages":"385-396"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86808106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Latency-based Vector Scheduling of Many-task Applications for a Hybrid Cloud 混合云多任务应用的基于延迟的矢量调度
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00047
Shifat P. Mithila, Gerald Baumgartner
A centralized scheduler can become a bottleneck for placing the tasks of a many-task application on heterogeneous cloud resources. We have previously demonstrated that a de-centralized vector scheduling approach based on performance measurements can be used successfully for this task placement scenario. In this paper, we extend this approach to task placement based on latency measurements. Each node collects the performance measurements from its neighbors on an overlay graph, measures the communication latency, and then makes local decisions on where to move tasks. We present a centralized algorithm for configuring the overlay graph based on latency measurements and extend the vector scheduling approach to take latency into considerations. Our experiments in CloudLab demonstrate that this approach results in better performance and resource utilization than without latency information.
集中式调度器可能成为将多任务应用程序的任务放在异构云资源上的瓶颈。我们之前已经证明,基于性能度量的去中心化矢量调度方法可以成功地用于此任务放置场景。在本文中,我们将这种方法扩展到基于延迟测量的任务放置。每个节点在覆盖图上从相邻节点收集性能测量值,测量通信延迟,然后就将任务移动到哪里做出本地决策。我们提出了一种基于延迟测量配置覆盖图的集中算法,并扩展了考虑延迟的矢量调度方法。我们在CloudLab中的实验表明,与没有延迟信息相比,这种方法可以获得更好的性能和资源利用率。
{"title":"Latency-based Vector Scheduling of Many-task Applications for a Hybrid Cloud","authors":"Shifat P. Mithila, Gerald Baumgartner","doi":"10.1109/CLOUD55607.2022.00047","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00047","url":null,"abstract":"A centralized scheduler can become a bottleneck for placing the tasks of a many-task application on heterogeneous cloud resources. We have previously demonstrated that a de-centralized vector scheduling approach based on performance measurements can be used successfully for this task placement scenario. In this paper, we extend this approach to task placement based on latency measurements. Each node collects the performance measurements from its neighbors on an overlay graph, measures the communication latency, and then makes local decisions on where to move tasks. We present a centralized algorithm for configuring the overlay graph based on latency measurements and extend the vector scheduling approach to take latency into considerations. Our experiments in CloudLab demonstrate that this approach results in better performance and resource utilization than without latency information.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"51 1","pages":"257-262"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82090489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Applying Value-Based Deep Reinforcement Learning on KPI Time Series Anomaly Detection 基于值的深度强化学习在KPI时间序列异常检测中的应用
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00039
Yu Zhang, Tianbo Wang
Time series anomaly detection has become more critical with the rapid development of network technology, especially in cloud monitoring. We focus on applying deep reinforcement learning (DRL) in this question. It is not feasible to simply use the traditional value-based DRL method because DRL cannot accurately capture important time information in time series. Most of the existing methods resort to the RNN mechanism, which in turn brings about the problem of sequence learning. In this paper, we conduct progressive research work on applying value-based DRL in time series anomaly detection. Firstly, because of the poor performance of traditional DQN, we propose an improved DQN-D method, whose performance is improved by 62% compared with DQN. Second, for RNN-based DRL, we propose a method based on improved experience replay pool (DRQN) to make up for the shortcomings of existing work and achieve excellent performance. Finally, we propose a Transformer-based DRL anomaly detection method to verify the effectiveness of the Transformer structure. Experimental results show that our DQN-D can obtain performance close to RNN-based DRL, DRQN and DTQN perform well on the dataset, and all methods are proven effective.
随着网络技术的飞速发展,特别是在云监测中,时间序列异常检测变得越来越重要。我们专注于在这个问题中应用深度强化学习(DRL)。由于DRL不能准确捕捉时间序列中的重要时间信息,单纯使用传统的基于值的DRL方法是不可行的。现有的方法大多采用RNN机制,这就带来了序列学习的问题。在本文中,我们对基于值的DRL在时间序列异常检测中的应用进行了逐步研究。首先,针对传统DQN算法性能较差的问题,我们提出了一种改进的DQN- d算法,其性能比DQN算法提高了62%。其次,对于基于rnn的DRL,我们提出了一种基于改进的经验重放池(DRQN)的方法,弥补了现有工作的不足,取得了优异的性能。最后,我们提出了一种基于Transformer的DRL异常检测方法来验证Transformer结构的有效性。实验结果表明,我们的DQN-D可以获得接近基于rnn的DRL的性能,DRQN和DTQN在数据集上表现良好,所有方法都被证明是有效的。
{"title":"Applying Value-Based Deep Reinforcement Learning on KPI Time Series Anomaly Detection","authors":"Yu Zhang, Tianbo Wang","doi":"10.1109/CLOUD55607.2022.00039","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00039","url":null,"abstract":"Time series anomaly detection has become more critical with the rapid development of network technology, especially in cloud monitoring. We focus on applying deep reinforcement learning (DRL) in this question. It is not feasible to simply use the traditional value-based DRL method because DRL cannot accurately capture important time information in time series. Most of the existing methods resort to the RNN mechanism, which in turn brings about the problem of sequence learning. In this paper, we conduct progressive research work on applying value-based DRL in time series anomaly detection. Firstly, because of the poor performance of traditional DQN, we propose an improved DQN-D method, whose performance is improved by 62% compared with DQN. Second, for RNN-based DRL, we propose a method based on improved experience replay pool (DRQN) to make up for the shortcomings of existing work and achieve excellent performance. Finally, we propose a Transformer-based DRL anomaly detection method to verify the effectiveness of the Transformer structure. Experimental results show that our DQN-D can obtain performance close to RNN-based DRL, DRQN and DTQN perform well on the dataset, and all methods are proven effective.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"48 1","pages":"197-202"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90476655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Message from the CLOUD 2022 Chairs 来自CLOUD 2022主席的信息
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/cloud55607.2022.00011
{"title":"Message from the CLOUD 2022 Chairs","authors":"","doi":"10.1109/cloud55607.2022.00011","DOIUrl":"https://doi.org/10.1109/cloud55607.2022.00011","url":null,"abstract":"","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79684642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guaranteeing Service Level Agreements for Triangle Counting via Observation-based Admission Control Algorithm 基于观测的接纳控制算法保证三角计数的服务水平协议
Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1109/CLOUD55607.2022.00050
Chinthaka Weerakkody, Miyuru Dayarathna, Sanath Jayasena, T. Suzumura
Maintaining guaranteed service level agreements on distributed graph processing for concurrent query execution is challenging because graph processing by nature is an unbalanced problem. In this paper we investigate on maintaining predefined service level agreements for graph processing workload mixtures taking triangle counting as the example. We develop a Graph Query Scheduler Mechanism (GQSM) which maintains a guaranteed service level agreement in terms of overall latency on top of JasmineGraph distributed graph database server. The proposed GQSM model is implemented using the queuing theory. Main component of GQSM is a job scheduler which is responsible for listening to an incoming job queue and scheduling the jobs received. The proposed model has a calibration phase where the Service Level Agreement (SLA) data, load average curve data, and maximum load average which can be handled by the hosts participating in the cluster without violating SLA is captured for the graphs in the system. Results show that for a single host system the SLA is successfully maintained when the total number of users is less than 6.
为并发查询执行维护分布式图处理上有保证的服务水平协议是具有挑战性的,因为图处理本质上是一个不平衡的问题。本文以三角形计数为例,研究了图处理工作负载混合中预定义服务水平协议的维护问题。我们开发了一种图形查询调度机制(GQSM),该机制在JasmineGraph分布式图形数据库服务器上维护了一个有保证的服务水平协议,即总体延迟。该GQSM模型采用排队理论实现。GQSM的主要组件是一个作业调度器,它负责监听传入的作业队列并调度接收到的作业。提出的模型有一个校准阶段,在此阶段中,为系统中的图捕获服务水平协议(SLA)数据、负载平均曲线数据和最大负载平均值,这些数据可以由参与集群的主机处理而不违反SLA。结果表明,对于单主机系统,当用户总数小于6时,SLA维护成功。
{"title":"Guaranteeing Service Level Agreements for Triangle Counting via Observation-based Admission Control Algorithm","authors":"Chinthaka Weerakkody, Miyuru Dayarathna, Sanath Jayasena, T. Suzumura","doi":"10.1109/CLOUD55607.2022.00050","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00050","url":null,"abstract":"Maintaining guaranteed service level agreements on distributed graph processing for concurrent query execution is challenging because graph processing by nature is an unbalanced problem. In this paper we investigate on maintaining predefined service level agreements for graph processing workload mixtures taking triangle counting as the example. We develop a Graph Query Scheduler Mechanism (GQSM) which maintains a guaranteed service level agreement in terms of overall latency on top of JasmineGraph distributed graph database server. The proposed GQSM model is implemented using the queuing theory. Main component of GQSM is a job scheduler which is responsible for listening to an incoming job queue and scheduling the jobs received. The proposed model has a calibration phase where the Service Level Agreement (SLA) data, load average curve data, and maximum load average which can be handled by the hosts participating in the cluster without violating SLA is captured for the graphs in the system. Results show that for a single host system the SLA is successfully maintained when the total number of users is less than 6.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"83 1","pages":"283-288"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75317706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Cloud Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1