Jianping Weng, Jessie Hui Wang, Jiahai Yang, Yang Yang
{"title":"Root cause analysis of anomalies of multitier services in public clouds","authors":"Jianping Weng, Jessie Hui Wang, Jiahai Yang, Yang Yang","doi":"10.1109/IWQoS.2017.7969155","DOIUrl":null,"url":null,"abstract":"Anomalies of multitier services running in cloud platform can be caused by components of the same tenant or performance interference from other tenants. If the performance of a multitier service degrades, we need to find out the root causes precisely to recover the service as soon as possible. In this paper, we argue that cloud providers are in a better position than tenants to solve this problem, and the solution should be non-intrusive to tenants' services or applications. Based on these two considerations, we propose a solution for cloud providers to help tenants to localize root causes of any anomaly. We design a non-intrusive method to capture the dependency relationships of components, which improves the feasibility of root cause localization system. Our solution can find out root causes no matter they are in the same tenant as the anomaly or from other tenants. Our proposed two-step localization algorithm exploits measurement data of both application layer and underlay infrastructure and a random walk procedure to improve its accuracy. Our real-world experiments of a three-tier web application running in a small-scale cloud platform show a 38.9% improvement in mean average precision compared to current methods.","PeriodicalId":422861,"journal":{"name":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS.2017.7969155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
Anomalies of multitier services running in cloud platform can be caused by components of the same tenant or performance interference from other tenants. If the performance of a multitier service degrades, we need to find out the root causes precisely to recover the service as soon as possible. In this paper, we argue that cloud providers are in a better position than tenants to solve this problem, and the solution should be non-intrusive to tenants' services or applications. Based on these two considerations, we propose a solution for cloud providers to help tenants to localize root causes of any anomaly. We design a non-intrusive method to capture the dependency relationships of components, which improves the feasibility of root cause localization system. Our solution can find out root causes no matter they are in the same tenant as the anomaly or from other tenants. Our proposed two-step localization algorithm exploits measurement data of both application layer and underlay infrastructure and a random walk procedure to improve its accuracy. Our real-world experiments of a three-tier web application running in a small-scale cloud platform show a 38.9% improvement in mean average precision compared to current methods.