Application performance management using learning, optimization, and control

Proceedings of the 5th ACM/SPEC international conference on Performance engineering Pub Date : 2014-03-22 DOI:10.1145/2568088.2576098

Xiaoyun Zhu

{"title":"Application performance management using learning, optimization, and control","authors":"Xiaoyun Zhu","doi":"10.1145/2568088.2576098","DOIUrl":null,"url":null,"abstract":"In the past decade, the IT industry has experienced a paradigm shift as computing resources became available as a utility through cloud based services. In spite of the wider adoption of cloud computing platforms, some businesses and organizations hesitate to move all their applications to the cloud due to performance concerns. Existing practices in application performance management rely heavily on white-box modeling and diagnosis approaches or on performance troubleshooting \"cookbooks\" to find potential bottlenecks and remediation steps. However, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. For performance isolation and differentiation, most modern hypervisors offer powerful resource control primitives such as reservations, limits, and shares for individual virtual machines (VMs). Even so, with the exploding growth of virtual machine sprawl, setting these controls properly such that co-located virtualized applications get enough resources to meet their respective service level objectives (SLOs) becomes a nearly insoluble task. These challenges present unique opportunities in leveraging the rich telemetry collected from applications and systems in the cloud, and in applying statistical learning, optimization, and control based techniques to developing model-based, automated application performance management frameworks. There has been a large body of research in this area in the last several years, but many problems remain. In this talk, I'll highlight some of the automated and data-driven performance management techniques we have developed, along with related technical challenges. I'll then discuss open research problems, in hope to attract more innovative ideas and solutions from a larger community of researchers and practitioners.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2568088.2576098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In the past decade, the IT industry has experienced a paradigm shift as computing resources became available as a utility through cloud based services. In spite of the wider adoption of cloud computing platforms, some businesses and organizations hesitate to move all their applications to the cloud due to performance concerns. Existing practices in application performance management rely heavily on white-box modeling and diagnosis approaches or on performance troubleshooting "cookbooks" to find potential bottlenecks and remediation steps. However, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. For performance isolation and differentiation, most modern hypervisors offer powerful resource control primitives such as reservations, limits, and shares for individual virtual machines (VMs). Even so, with the exploding growth of virtual machine sprawl, setting these controls properly such that co-located virtualized applications get enough resources to meet their respective service level objectives (SLOs) becomes a nearly insoluble task. These challenges present unique opportunities in leveraging the rich telemetry collected from applications and systems in the cloud, and in applying statistical learning, optimization, and control based techniques to developing model-based, automated application performance management frameworks. There has been a large body of research in this area in the last several years, but many problems remain. In this talk, I'll highlight some of the automated and data-driven performance management techniques we have developed, along with related technical challenges. I'll then discuss open research problems, in hope to attract more innovative ideas and solutions from a larger community of researchers and practitioners.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用学习、优化和控制的应用程序性能管理

在过去的十年中，随着计算资源通过基于云的服务作为实用程序可用，IT行业经历了范式转变。尽管云计算平台得到了更广泛的采用，但由于性能问题，一些企业和组织对将所有应用程序迁移到云中犹豫不决。应用程序性能管理中的现有实践严重依赖于白盒建模和诊断方法，或者依赖于性能故障排除“菜谱”来发现潜在的瓶颈和补救步骤。但是，这些方法的可伸缩性和适应性仍然受到严重限制，特别是在高度动态的合并云环境中。为了实现性能隔离和区分，大多数现代管理程序都提供了强大的资源控制原语，例如针对单个虚拟机(vm)的保留、限制和共享。即便如此，随着虚拟机扩展的爆炸式增长，适当地设置这些控制以使位于同一位置的虚拟化应用程序获得足够的资源来满足各自的服务水平目标(slo)几乎成为一项无法解决的任务。这些挑战为利用从云中的应用程序和系统收集的丰富遥测数据，以及应用统计学习、优化和基于控制的技术来开发基于模型的自动化应用程序性能管理框架提供了独特的机会。在过去的几年里，这一领域已经有了大量的研究，但仍然存在许多问题。在这次演讲中，我将重点介绍我们开发的一些自动化和数据驱动的性能管理技术，以及相关的技术挑战。然后，我将讨论开放的研究问题，希望从更大的研究人员和实践者群体中吸引更多的创新想法和解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

自引率

0.00%

发文量