{"title":"Application performance management using learning, optimization, and control","authors":"Xiaoyun Zhu","doi":"10.1145/2568088.2576098","DOIUrl":null,"url":null,"abstract":"In the past decade, the IT industry has experienced a paradigm shift as computing resources became available as a utility through cloud based services. In spite of the wider adoption of cloud computing platforms, some businesses and organizations hesitate to move all their applications to the cloud due to performance concerns. Existing practices in application performance management rely heavily on white-box modeling and diagnosis approaches or on performance troubleshooting \"cookbooks\" to find potential bottlenecks and remediation steps. However, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. For performance isolation and differentiation, most modern hypervisors offer powerful resource control primitives such as reservations, limits, and shares for individual virtual machines (VMs). Even so, with the exploding growth of virtual machine sprawl, setting these controls properly such that co-located virtualized applications get enough resources to meet their respective service level objectives (SLOs) becomes a nearly insoluble task. These challenges present unique opportunities in leveraging the rich telemetry collected from applications and systems in the cloud, and in applying statistical learning, optimization, and control based techniques to developing model-based, automated application performance management frameworks. There has been a large body of research in this area in the last several years, but many problems remain. In this talk, I'll highlight some of the automated and data-driven performance management techniques we have developed, along with related technical challenges. I'll then discuss open research problems, in hope to attract more innovative ideas and solutions from a larger community of researchers and practitioners.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2568088.2576098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the past decade, the IT industry has experienced a paradigm shift as computing resources became available as a utility through cloud based services. In spite of the wider adoption of cloud computing platforms, some businesses and organizations hesitate to move all their applications to the cloud due to performance concerns. Existing practices in application performance management rely heavily on white-box modeling and diagnosis approaches or on performance troubleshooting "cookbooks" to find potential bottlenecks and remediation steps. However, the scalability and adaptivity of such approaches remain severely constrained, especially in a highly-dynamic, consolidated cloud environment. For performance isolation and differentiation, most modern hypervisors offer powerful resource control primitives such as reservations, limits, and shares for individual virtual machines (VMs). Even so, with the exploding growth of virtual machine sprawl, setting these controls properly such that co-located virtualized applications get enough resources to meet their respective service level objectives (SLOs) becomes a nearly insoluble task. These challenges present unique opportunities in leveraging the rich telemetry collected from applications and systems in the cloud, and in applying statistical learning, optimization, and control based techniques to developing model-based, automated application performance management frameworks. There has been a large body of research in this area in the last several years, but many problems remain. In this talk, I'll highlight some of the automated and data-driven performance management techniques we have developed, along with related technical challenges. I'll then discuss open research problems, in hope to attract more innovative ideas and solutions from a larger community of researchers and practitioners.