2019 IEEE International Conference on Autonomic Computing (ICAC)最新文献

Chisel: Reshaping Queries to Trim Latency in Key-Value Stores 凿:重塑查询以减少键值存储中的延迟

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-09-13 DOI: 10.1109/ICAC.2019.00016

R. Birke, Juan F. Pérez, Sonia Ben Mokhtar, N. Rameshan, L. Chen

It is challenging for key-value data stores to trim user (tail) latency of requests as the workloads are observed to have skewed number of key-value pairs and commonly retrieved via multiget operation, i.e., all keys at the same time. In this paper we present Chisel, a novel client side solution to efficiently reshape the query size at the data store by adaptively splitting big requests into chunks to reap the benefits of parallelism and merge small requests into a single query to amortize latency overheads per request. We derive a novel layered queueing model that can quickly and approximately steer the decisions of Chisel. We extensively evaluate Chisel on memcached clusters hosted on a testbed, across a large number of scenarios with different workloads and system configurations. Our evaluation results show that Chisel can overturn the inherent high variability of requests into a judicious operational region, showcasing significant gains for the mean and 95th percentile of user perceived latency, compared to the state-of-art query processing policy.

对于键值数据存储来说，减少请求的用户(尾)延迟是一项挑战，因为观察到工作负载具有倾斜的键值对数量，并且通常通过multiget操作检索，即同时检索所有键。在本文中，我们介绍了Chisel，这是一种新颖的客户端解决方案，通过自适应地将大请求分成块来获得并行性的好处，并将小请求合并到单个查询中来分摊每个请求的延迟开销，从而有效地重塑数据存储中的查询大小。我们推导了一种新的分层排队模型，该模型可以快速、近似地引导Chisel的决策。我们在测试平台上托管的memcached集群上广泛评估了Chisel，涵盖了具有不同工作负载和系统配置的大量场景。我们的评估结果表明，与最先进的查询处理策略相比，Chisel可以将请求固有的高可变性转化为明智的操作区域，在用户感知延迟的平均值和第95百分位数上显示出显著的收益。

引用次数: 1

Characterizing Disk Health Degradation and Proactively Protecting Against Disk Failures for Reliable Storage Systems 分析硬盘健康退化特征，为可靠的存储系统提供硬盘故障预防措施

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-16 DOI: 10.1109/ICAC.2019.00027

Song Huang, Shuwen Liang, Song Fu, Weisong Shi, Devesh Tiwari, Hsing-bung Chen

The booming of cloud computing, online services and big data applications have resulted in dramatic expansion of storage systems. Meanwhile, disk drives are reported to be the most commonly replaced hardware component. Disk failures cause service downtime and even data loss, costing enterprises multi-trillion dollars per year. Existing disk failure management approaches are mostly reactive and incur high overheads. To overcome these problems, in this paper, we present a proactive, cost-effective solution to managing large-scale production storage systems. We aim to uncover the entire process in which disk's health deteriorates and forecast when disk drives will fail in the future. Due to a common lack of diagnostic information of disk failures, we rely on the Self-Monitoring, Analysis and Reporting Technology (SMART) data and explore statistical analysis techniques to identify the start of disk degradation. We then model the disk degradation processes as functions of SMART attributes, which eliminates the dependency on time and thus I/O workload. Experimental results from over 23,000 enterprise-class disk drives in a production data center show that our derived models can accurately quantify the degradation of disk health, which enables us to proactively protect data against disk failures. We also investigate several types of disk failures and propose remediation mechanisms to prolong disk lifetime.

云计算、在线服务和大数据应用的蓬勃发展导致了存储系统的急剧扩张。同时，据报道，磁盘驱动器是最常被更换的硬件部件。磁盘故障导致服务停机甚至数据丢失，每年给企业造成数万亿美元的损失。现有的磁盘故障管理方法大多是被动的，并且会产生很高的开销。为了克服这些问题，在本文中，我们提出了一个主动的，具有成本效益的解决方案来管理大规模生产存储系统。我们的目标是揭示磁盘健康状况恶化的整个过程，并预测磁盘驱动器将来何时会发生故障。由于普遍缺乏硬盘故障诊断信息，我们依靠SMART (Self-Monitoring, Analysis and Reporting Technology)数据，探索统计分析技术来识别硬盘退化的开始。然后，我们将磁盘降级过程建模为SMART属性的函数，这消除了对时间的依赖，从而消除了对I/O工作负载的依赖。在一个生产数据中心的23,000多个企业级磁盘驱动器上进行的实验结果表明，我们导出的模型可以准确地量化磁盘健康状况的退化，从而使我们能够主动保护数据免受磁盘故障的影响。我们还研究了几种类型的磁盘故障，并提出了修复机制，以延长磁盘寿命。

{"title":"Characterizing Disk Health Degradation and Proactively Protecting Against Disk Failures for Reliable Storage Systems","authors":"Song Huang, Shuwen Liang, Song Fu, Weisong Shi, Devesh Tiwari, Hsing-bung Chen","doi":"10.1109/ICAC.2019.00027","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00027","url":null,"abstract":"The booming of cloud computing, online services and big data applications have resulted in dramatic expansion of storage systems. Meanwhile, disk drives are reported to be the most commonly replaced hardware component. Disk failures cause service downtime and even data loss, costing enterprises multi-trillion dollars per year. Existing disk failure management approaches are mostly reactive and incur high overheads. To overcome these problems, in this paper, we present a proactive, cost-effective solution to managing large-scale production storage systems. We aim to uncover the entire process in which disk's health deteriorates and forecast when disk drives will fail in the future. Due to a common lack of diagnostic information of disk failures, we rely on the Self-Monitoring, Analysis and Reporting Technology (SMART) data and explore statistical analysis techniques to identify the start of disk degradation. We then model the disk degradation processes as functions of SMART attributes, which eliminates the dependency on time and thus I/O workload. Experimental results from over 23,000 enterprise-class disk drives in a production data center show that our derived models can accurately quantify the degradation of disk health, which enables us to proactively protect data against disk failures. We also investigate several types of disk failures and propose remediation mechanisms to prolong disk lifetime.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127399755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

GreenRoute: A Generalizable Fuel-Saving Vehicular Navigation Service 绿色路线:一个通用的节油车辆导航服务

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-16 DOI: 10.1109/ICAC.2019.00011

Yiran Zhao, Shuochao Yao, Dongxin Liu, Huajie Shao, Shengzhong Liu

This paper presents GreenRoute, a fuel-saving vehicular navigation system whose contribution is motivated by one of the key challenges in the design of autonomic services: namely, designing the service in a manner that reduces operating cost. GreenRoute achieves this end, in the specific context of fuel-saving vehicular navigation, by significantly improving the generalizability of fuel consumption models it learns (in order to recommend fuel-saving routes to drivers). By learning fuel consumption models that apply seamlessly across vehicles and routes, GreenRoute eliminates one of the key incremental costs unique to fuel-saving navigation: namely, the cost of upkeep with ever-changing fuel-consumption-specific route and vehicle parameters globally. Unlike shortest or fastest routes (that depend only on map topology and traffic), minimum-fuel routes depend additionally on the vehicle engine. This makes fuel-efficient routes harder to compute in a generic fashion, compared to shortest and fastest routes. The difficulty results in two additional costs. First, more route features need to be collected (and updated) for predicting fuel consumption, such as the nature of traffic regulators. Second, fuel prediction remains specific to the individual vehicle type, which requires continual upkeep with new car types and parameters. The contribution of this paper lies in deriving and implementing a fuel consumption model that avoids both of the above two sources of cost. To measure route recommendation quality, we test the system (using 21 vehicles and over 2400 miles driven in seven US cities) by comparing fuel consumption on our routes against both Google Maps' routes and the shortest routes. Results show that, on average, our routes save 10.8% fuel compared to Google Maps' routes and save 8.4% compared to the shortest routes. This is roughly comparable to services that maintain individualized vehicle models, suggesting that our low-cost models do not come at the expense of quality reduction.

本文介绍了GreenRoute，这是一种节油的车辆导航系统，其贡献源于自主服务设计中的一个关键挑战:即以降低运营成本的方式设计服务。在节油车辆导航的特定背景下，GreenRoute通过显著提高其学习的油耗模型的通用性(以便向驾驶员推荐节油路线)来实现这一目标。通过学习无缝应用于车辆和路线的油耗模型，GreenRoute消除了节油导航所独有的一个关键增量成本:即不断变化的油耗特定路线和车辆参数的维护成本。与最短或最快路线(仅取决于地图拓扑和交通)不同，最低燃料路线还取决于车辆的发动机。与最短和最快的路线相比，这使得以通用方式计算省油路线变得更加困难。这种困难导致了两个额外的成本。首先，需要收集(并更新)更多的路线特征来预测燃料消耗，比如交通监管机构的性质。其次，燃料预测仍然是特定于单个车型的，这需要不断地维护新的车型和参数。本文的贡献在于推导并实现了一个避免上述两种成本来源的燃料消耗模型。为了衡量路线推荐的质量，我们测试了该系统(使用21辆汽车，在美国7个城市行驶了2400多英里)，将我们的路线与谷歌地图的路线和最短路线的油耗进行了比较。结果显示，与谷歌地图的路线相比，我们的路线平均节省10.8%的燃料，与最短的路线相比节省8.4%。这与保持个性化车型的服务大致相当，这表明我们的低成本车型不会以降低质量为代价。

{"title":"GreenRoute: A Generalizable Fuel-Saving Vehicular Navigation Service","authors":"Yiran Zhao, Shuochao Yao, Dongxin Liu, Huajie Shao, Shengzhong Liu","doi":"10.1109/ICAC.2019.00011","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00011","url":null,"abstract":"This paper presents GreenRoute, a fuel-saving vehicular navigation system whose contribution is motivated by one of the key challenges in the design of autonomic services: namely, designing the service in a manner that reduces operating cost. GreenRoute achieves this end, in the specific context of fuel-saving vehicular navigation, by significantly improving the generalizability of fuel consumption models it learns (in order to recommend fuel-saving routes to drivers). By learning fuel consumption models that apply seamlessly across vehicles and routes, GreenRoute eliminates one of the key incremental costs unique to fuel-saving navigation: namely, the cost of upkeep with ever-changing fuel-consumption-specific route and vehicle parameters globally. Unlike shortest or fastest routes (that depend only on map topology and traffic), minimum-fuel routes depend additionally on the vehicle engine. This makes fuel-efficient routes harder to compute in a generic fashion, compared to shortest and fastest routes. The difficulty results in two additional costs. First, more route features need to be collected (and updated) for predicting fuel consumption, such as the nature of traffic regulators. Second, fuel prediction remains specific to the individual vehicle type, which requires continual upkeep with new car types and parameters. The contribution of this paper lies in deriving and implementing a fuel consumption model that avoids both of the above two sources of cost. To measure route recommendation quality, we test the system (using 21 vehicles and over 2400 miles driven in seven US cities) by comparing fuel consumption on our routes against both Google Maps' routes and the shortest routes. Results show that, on average, our routes save 10.8% fuel compared to Google Maps' routes and save 8.4% compared to the shortest routes. This is roughly comparable to services that maintain individualized vehicle models, suggesting that our low-cost models do not come at the expense of quality reduction.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116448298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study 使用gpu自适应加速Map-Reduce/Spark:一个案例研究

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-16 DOI: 10.1109/ICAC.2019.00022

K. R. Jayaram, Anshul Gandhi, Hongyi Xin, S. Tao

In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications – multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks – Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.

在本文中，我们提出并评估了一种简单的机制来加速在Hadoop map-reduce (stock)和Apache Spark中实现的迭代机器学习算法。特别是，我们描述了一种技术，该技术使map-reduce和Spark中的数据并行任务能够基于可用性和负载在CPU或GPU上动态自适应地调度。我们检查了性能改进的程度，并将它们与所研究算法的各种参数相关联。我们关注端到端的性能影响，包括与将数据传入和传出GPU相关的开销，以及JVM和GPU中的数据表示之间的转换。我们还提出了三种优化方法，在我们的分析中，它们可以推广到许多迭代机器学习应用中。我们提出了一个案例研究，其中我们加速了四个迭代机器学习应用程序-多项逻辑回归，多元线性回归，K-Means聚类和主成分分析使用奇异值分解，在三个数据分析框架中实现- Hadoop Map-Reduce (HMR)， IBM Main-Memory Map-Reduce (M3R)和Spark。我们观察到，使用gpgpu可以将这些应用程序在HMR上的执行时间减少多达8倍，M3R最多减少18倍，Spark最多减少25倍。通过我们的实证分析，我们提供了一些见解，可以帮助设计中间件和集群管理器来加速使用gpu的map-reduce和Spark应用程序。

{"title":"Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study","authors":"K. R. Jayaram, Anshul Gandhi, Hongyi Xin, S. Tao","doi":"10.1109/ICAC.2019.00022","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00022","url":null,"abstract":"In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications – multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks – Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130040137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Capacity-Driven Scaling Schedules Derivation for Coordinated Elasticity of Containers and Virtual Machines 容器和虚拟机协调弹性的容量驱动伸缩计划推导

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00029

Yesika M. Ramirez, Vladimir Podolskiy, M. Gerndt

With the growing complexity of microservice applications and proliferation of containers, scaling of cloud applications became challenging. Containers enabled the adaptation of the application capacity to the changing workload on the finer level of granularity than it was possible only with virtual machines. The common way to automate the adaptation of a cloud application is via autoscaling. Autoscaling is provided both on the level of virtual machines and containers. Its accuracy on dynamic workloads suffers significantly from the reactive nature of the available autoscaling solutions. The aim of the paper is to explore potential improvements of autoscaling by designing and evaluating several predictive-based autoscaling policies. These policies are naive (used as a baseline), best resource pair, only-Delta-load, always-resize, resize when beneficial. The scaling policies were implemented in Scaling Policy Derivation Tool (SPDT). SPDT takes the long-term forecast of the workload and the capacity model of microservices as input to produce the sequence of scaling actions scheduled for the execution in future with the aims to meet the service level objectives and minimize the costs. Policies implemented in SPDT were evaluated for three microservice applications and several workload patterns. The tests demonstrate that the combination of horizontal and vertical scaling enables more flexibility and reduces costs. Schedule derivation according to some policies might be compute-intensive, therefore careful consideration of the optimization objective (e.g. cost minimization or timeliness of the scaling policy) is required from the user of SPDT.

随着微服务应用程序的日益复杂和容器的激增，云应用程序的扩展变得具有挑战性。容器支持在更细的粒度级别上调整应用程序容量以适应不断变化的工作负载，而不是只使用虚拟机。自动调整云应用程序的常用方法是通过自动缩放。在虚拟机和容器级别上都提供了自动伸缩功能。它在动态工作负载上的准确性受到可用的自动缩放解决方案的反应性的影响。本文的目的是通过设计和评估几种基于预测的自动缩放策略来探索自动缩放的潜在改进。这些策略是朴素的(用作基准)、最佳资源对、仅增量加载、始终调整大小、在有利时调整大小。扩展策略在扩展策略派生工具(scaling Policy Derivation Tool, SPDT)中实现。SPDT将工作负载的长期预测和微服务的容量模型作为输入，以产生计划在未来执行的扩展操作序列，其目的是满足服务级别目标并最小化成本。在SPDT中实现的策略针对三种微服务应用程序和几种工作负载模式进行了评估。测试表明，水平和垂直缩放相结合可以提高灵活性并降低成本。根据某些策略进行计划派生可能需要大量的计算，因此SPDT用户需要仔细考虑优化目标(例如成本最小化或扩展策略的及时性)。

{"title":"Capacity-Driven Scaling Schedules Derivation for Coordinated Elasticity of Containers and Virtual Machines","authors":"Yesika M. Ramirez, Vladimir Podolskiy, M. Gerndt","doi":"10.1109/ICAC.2019.00029","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00029","url":null,"abstract":"With the growing complexity of microservice applications and proliferation of containers, scaling of cloud applications became challenging. Containers enabled the adaptation of the application capacity to the changing workload on the finer level of granularity than it was possible only with virtual machines. The common way to automate the adaptation of a cloud application is via autoscaling. Autoscaling is provided both on the level of virtual machines and containers. Its accuracy on dynamic workloads suffers significantly from the reactive nature of the available autoscaling solutions. The aim of the paper is to explore potential improvements of autoscaling by designing and evaluating several predictive-based autoscaling policies. These policies are naive (used as a baseline), best resource pair, only-Delta-load, always-resize, resize when beneficial. The scaling policies were implemented in Scaling Policy Derivation Tool (SPDT). SPDT takes the long-term forecast of the workload and the capacity model of microservices as input to produce the sequence of scaling actions scheduled for the execution in future with the aims to meet the service level objectives and minimize the costs. Policies implemented in SPDT were evaluated for three microservice applications and several workload patterns. The tests demonstrate that the combination of horizontal and vertical scaling enables more flexibility and reduces costs. Schedule derivation according to some policies might be compute-intensive, therefore careful consideration of the optimization objective (e.g. cost minimization or timeliness of the scaling policy) is required from the user of SPDT.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130596495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Affine Scalarization of Two-Dimensional Utility Using the Pareto Front 二维效用的仿射标量化

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00026

G. Horn, M. Rózanska

Cloud computing promises flexibility, and allows applications to dynamically scale or change configuration in response to demand. Autonomic deployment is the best way to manage such applications, and the deployment decisions should aim to optimize the application owner's utility. In general this leads to multi-objective deployment decisions over multiple utility dimensions. Such problems are typically managed by forming a scalar utility as a weighted combination of various objective dimensions. However, then the maximum utility is not only depending on the utility dimensions, but also on the weights used in the scalarization. This paper proposes an approach that has the potential to reduce the number of possible deployment configurations to consider, namely the ones with least sensitivity to the weights used in the scalarization and demonstrates this approach for a small industrial application for the bi-criterion case, which is of practical importance as many real Cloud deployments aim to simultaneously minimizing the deployment cost utility dimension and maximizing the application performance utility dimension.

云计算保证了灵活性，并允许应用程序根据需求动态扩展或更改配置。自主部署是管理这类应用程序的最佳方式，部署决策的目标应该是优化应用程序所有者的效用。通常，这会导致在多个实用程序维度上做出多目标部署决策。这样的问题通常通过形成一个标量效用作为各种客观维度的加权组合来管理。然而，最大效用不仅取决于效用维度，还取决于尺度化中使用的权重。本文提出了一种方法，该方法有可能减少需要考虑的可能部署配置的数量，即对缩放中使用的权重最不敏感的配置，并在双标准情况下为小型工业应用演示了这种方法，这具有实际重要性，因为许多实际云部署的目标是同时最小化部署成本效用维度和最大化应用程序性能效用维度。

引用次数: 5

EMU-IoT - A Virtual Internet of Things Lab EMU-IoT——虚拟物联网实验室

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00019

B. Ramprasad, Marios Fokaefs, Joydeep Mukherjee, Marin Litoiu

Internet-of-Things technologies are rapidly emerging as the cornerstone of modern digital life. IoT is the main driver for the increased "intelligence" in most aspects of everyday life: smart transportation, smart buildings, smart energy, smart health. Nevertheless, further progress and research are in danger of being slowed down. One important reason is the cost of infrastructure at scale. The difficulties in setting up very large IoT networks do not permit us to stress test the systems and argue about their performance and their durability. To tackle this problem, this work proposes EMU-IoT, a virtual lab for IoT technologies. Using virtualization and container technologies, we demonstrate an experimentation infrastructure to enable researchers and other practitioners to conduct large scale experiments and test several quality aspects of IoT systems with minimal requirements in devices and other equipment. In this paper, we show how easy and simple it is to set up experiments with EMU-IoT and we demonstrate the usefulness of EMU-IoT by conducting experiments in our lab.

物联网技术正迅速成为现代数字生活的基石。物联网是日常生活中大多数方面增加“智能”的主要驱动力:智能交通、智能建筑、智能能源、智能健康。然而，进一步的进展和研究正面临着放缓的危险。一个重要的原因是大规模基础设施的成本。建立非常大的物联网网络的困难不允许我们对系统进行压力测试，并争论它们的性能和耐用性。为了解决这个问题，这项工作提出了EMU-IoT，一个物联网技术的虚拟实验室。使用虚拟化和容器技术，我们展示了一个实验基础设施，使研究人员和其他从业者能够进行大规模实验，并在设备和其他设备的最低要求下测试物联网系统的几个质量方面。在本文中，我们展示了用EMU-IoT建立实验是多么容易和简单，并通过在我们的实验室中进行实验来证明EMU-IoT的实用性。

引用次数: 12

Express-Lane Scheduling and Multithreading to Minimize the Tail Latency of Microservices 快速通道调度和多线程最小化微服务尾部延迟

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00031

Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch

Managing high-percentile tail latencies is key to designing user-facing cloud microservices. A main contributor to end-to-end tail latency is queuing, wherein nominal tasks are enqueued behind rare, long ones, due to head-of-line blocking. In this paper, we propose Express-Lane SMT (ESMT), which extends the hardware scheduling of a simultaneously multithreaded (SMT) core to provide an "express-lane" execution context for short tasks, protecting them from queuing behind rare, long ones. As tasks reach predefined service cutoffs, ESMT preempts and migrates them to the subsequent queue to be serviced by the next SMT execution lane, thereby preventing Head-of-Line (HoL) blocking. We further propose an enhanced variant of ESMT that allows execution lanes to work-steal from each other to maximize utilization. Our evaluation shows that ESMT with work stealing reduces tail latency over a conventional SMT core by an average of 56% and 67% under moderate (40%) and high (70%) system loads, respectively.

管理高百分位数的尾部延迟是设计面向用户的云微服务的关键。端到端尾部延迟的一个主要原因是排队，其中，由于排队阻塞，名义任务排在罕见的长任务后面。在本文中，我们提出了快速通道SMT (ESMT)，它扩展了同步多线程(SMT)内核的硬件调度，为短任务提供了一个“快速通道”执行上下文，保护它们不被排在稀有的长任务后面。当任务达到预定义的服务截止点时，ESMT会抢占并将它们迁移到后续队列中，由下一个SMT执行通道提供服务，从而防止排队阻塞。我们进一步提出了一种增强的ESMT变体，它允许执行通道相互窃取工作，以最大化利用率。我们的评估表明，在中等(40%)和高(70%)系统负载下，与传统SMT核心相比，带有工作窃取的ESMT平均减少了56%和67%的尾部延迟。

引用次数: 9

Characterizing Service Level Objectives for Cloud Services: Realities and Myths 描述云服务的服务水平目标:现实与神话

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00032

Jianru Ding, Ruiqi Cao, Indrajeet Saravanan, Nathaniel Morris, Christopher Stewart

Service level objectives (SLOs) stipulate performance goals for cloud applications, microservices, and infrastructure. SLOs are widely used, in part, because system managers can tailor goals to their products, companies, and workloads. Systems research intended to support strong SLOs should target realistic performance goals used by system managers in the field. Evaluations conducted with uncommon SLO goals may not translate to real systems. Some textbooks discuss the structure of SLOs but (1) they only sketch SLO goals and (2) they use outdated examples. We mined real SLOs published on the web, extracted their goals and characterized them. Many web documents discuss SLOs loosely but few provide details and reflect real settings. Systematic literature review (SLR) prunes results and reduces bias by (1) modeling expected SLO structure and (2) detecting and removing outliers. We collected 75 SLOs where response time, query percentile and reporting period were specified. We used these SLOs to confirm and refute common perceptions. For example, we found few SLOs with response time guarantees below 10 ms for 90% or more queries. This reality bolsters perceptions that single digit SLOs face fundamental research challenges.

服务水平目标(slo)规定了云应用程序、微服务和基础设施的性能目标。slo被广泛使用，部分原因是系统管理人员可以根据他们的产品、公司和工作负载定制目标。旨在支持强大的slo的系统研究应该针对系统管理人员在该领域使用的实际性能目标。以不常见的SLO目标进行的评估可能无法转化为实际系统。一些教科书讨论了SLO的结构，但(1)他们只概述了SLO的目标，(2)他们使用过时的例子。我们挖掘了在网络上发布的真实的slo，提取了它们的目标并对其进行了描述。许多web文档松散地讨论了slo，但很少提供细节并反映实际设置。系统文献综述(SLR)通过(1)建模预期的SLO结构和(2)检测和去除异常值来修剪结果并减少偏差。我们收集了75个slo，其中指定了响应时间、查询百分位数和报告周期。我们用这些slo来证实和反驳一些普遍的看法。例如，对于90%或更多的查询，我们发现很少有响应时间保证低于10毫秒的slo。这一现实支持了个位数slo面临基础研究挑战的看法。

{"title":"Characterizing Service Level Objectives for Cloud Services: Realities and Myths","authors":"Jianru Ding, Ruiqi Cao, Indrajeet Saravanan, Nathaniel Morris, Christopher Stewart","doi":"10.1109/ICAC.2019.00032","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00032","url":null,"abstract":"Service level objectives (SLOs) stipulate performance goals for cloud applications, microservices, and infrastructure. SLOs are widely used, in part, because system managers can tailor goals to their products, companies, and workloads. Systems research intended to support strong SLOs should target realistic performance goals used by system managers in the field. Evaluations conducted with uncommon SLO goals may not translate to real systems. Some textbooks discuss the structure of SLOs but (1) they only sketch SLO goals and (2) they use outdated examples. We mined real SLOs published on the web, extracted their goals and characterized them. Many web documents discuss SLOs loosely but few provide details and reflect real settings. Systematic literature review (SLR) prunes results and reduces bias by (1) modeling expected SLO structure and (2) detecting and removing outliers. We collected 75 SLOs where response time, query percentile and reporting period were specified. We used these SLOs to confirm and refute common perceptions. For example, we found few SLOs with response time guarantees below 10 ms for 90% or more queries. This reality bolsters perceptions that single digit SLOs face fundamental research challenges.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130713575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Quality-Elasticity: Improved Resource Utilization, Throughput, and Response Times Via Adjusting Output Quality to Current Operating Conditions 质量弹性:通过调整输出质量以适应当前操作条件，提高资源利用率、吞吐量和响应时间

2019 IEEE International Conference on Autonomic Computing (ICAC)

Pub Date : 2019-06-01 DOI: 10.1109/ICAC.2019.00017

L. Larsson, William Tarneberg, C. Klein, E. Elmroth

This work addresses two related problems for on-line services, namely poor resource utilization during regular operating conditions, and low throughput, long response times, or poor performance under periods of high system load. To address these problems, we introduce our notion of quality-elasticity as a manner of dynamically adapting response qualities from software services along a fine-grained spectrum. When resources are abundant, response quality can be increased, and when resources are scarce, responses are delivered at a lower quality to prioritize throughput and response times. We present an example of how a complex online shopping site can be made quality-elastic. Experiments show that, compared to state of the art, improvements in throughput (57% more served queries), lowered response times (8 time reduction for 95th percentile responses), and an estimated 40% profitability increase can be made using our quality-elastic approach. When resources are abundant, our approach may achieve upwards of twice as high resource utilization as prior work in this field.

这项工作解决了在线服务的两个相关问题，即在常规操作条件下资源利用率低，以及在高系统负载期间低吞吐量、长响应时间或性能差。为了解决这些问题，我们引入了质量弹性的概念，作为一种沿着细粒度谱动态调整软件服务响应质量的方式。当资源充足时，可以提高响应质量;当资源稀缺时，以较低的质量交付响应，以优先考虑吞吐量和响应时间。我们提供了一个例子，说明如何使一个复杂的在线购物网站具有质量弹性。实验表明，与目前的技术水平相比，使用我们的质量弹性方法可以提高吞吐量(增加57%的服务查询)，降低响应时间(第95百分位响应时间减少8次)，并估计提高40%的盈利能力。当资源丰富时，我们的方法可以实现比该领域先前工作高两倍以上的资源利用率。

{"title":"Quality-Elasticity: Improved Resource Utilization, Throughput, and Response Times Via Adjusting Output Quality to Current Operating Conditions","authors":"L. Larsson, William Tarneberg, C. Klein, E. Elmroth","doi":"10.1109/ICAC.2019.00017","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00017","url":null,"abstract":"This work addresses two related problems for on-line services, namely poor resource utilization during regular operating conditions, and low throughput, long response times, or poor performance under periods of high system load. To address these problems, we introduce our notion of quality-elasticity as a manner of dynamically adapting response qualities from software services along a fine-grained spectrum. When resources are abundant, response quality can be increased, and when resources are scarce, responses are delivered at a lower quality to prioritize throughput and response times. We present an example of how a complex online shopping site can be made quality-elastic. Experiments show that, compared to state of the art, improvements in throughput (57% more served queries), lowered response times (8 time reduction for 95th percentile responses), and an estimated 40% profitability increase can be made using our quality-elastic approach. When resources are abundant, our approach may achieve upwards of twice as high resource utilization as prior work in this field.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122275743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6