Anomaly Detection in Cloud-Native Systems

2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) Pub Date : 2022-08-01 DOI:10.1109/SEAA56994.2022.00023

Francesco Lomio, Sergio Moreschini, Xiaozhou Li, Valentina Lenarduzzi

{"title":"Anomaly Detection in Cloud-Native Systems","authors":"Francesco Lomio, Sergio Moreschini, Xiaozhou Li, Valentina Lenarduzzi","doi":"10.1109/SEAA56994.2022.00023","DOIUrl":null,"url":null,"abstract":"Companies develop cloud-native systems deployed on public and private clouds. Since private clouds have limited resources, the systems should run efficiently by keeping performance related anomalies under control. The goal of this work is to understand whether a set of five performance-related KPIs depends on the metrics collected at runtime by Kafka, Zookeeper, and other tools (168 different metrics). We considered four weeks worth of runtime data collected from a system running in production. We trained eight Machine Learning algorithms on three weeks worth of data and tested them on one week’s worth of data to compare their prediction accuracy and their training and testing time. It is possible to detect performance-related anomalies with a very high level of accuracy (higher than 95% AUC) and with very limited training time (between 8 and 17 minutes). Machine Learning algorithms can help to identify runtime anomalies and to detect them efficiently. Future work will include the identification of a proactive approach to recognize the root cause of the anomalies and to prevent them as early as possible.","PeriodicalId":269970,"journal":{"name":"2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEAA56994.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Companies develop cloud-native systems deployed on public and private clouds. Since private clouds have limited resources, the systems should run efficiently by keeping performance related anomalies under control. The goal of this work is to understand whether a set of five performance-related KPIs depends on the metrics collected at runtime by Kafka, Zookeeper, and other tools (168 different metrics). We considered four weeks worth of runtime data collected from a system running in production. We trained eight Machine Learning algorithms on three weeks worth of data and tested them on one week’s worth of data to compare their prediction accuracy and their training and testing time. It is possible to detect performance-related anomalies with a very high level of accuracy (higher than 95% AUC) and with very limited training time (between 8 and 17 minutes). Machine Learning algorithms can help to identify runtime anomalies and to detect them efficiently. Future work will include the identification of a proactive approach to recognize the root cause of the anomalies and to prevent them as early as possible.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

云原生系统中的异常检测

公司开发部署在公共云和私有云上的云原生系统。由于私有云的资源有限，因此系统应该通过控制与性能相关的异常来高效运行。这项工作的目标是了解一组五个与性能相关的kpi是否依赖于Kafka, Zookeeper和其他工具在运行时收集的指标(168个不同的指标)。我们考虑了从生产中运行的系统收集的4周的运行时数据。我们在三周的数据上训练了八种机器学习算法，并在一周的数据上对它们进行了测试，以比较它们的预测准确性和训练和测试时间。它可以以非常高的准确度(高于95% AUC)和非常有限的训练时间(8到17分钟)检测与性能相关的异常。机器学习算法可以帮助识别运行时异常并有效地检测它们。未来的工作将包括确定一种主动的方法，以识别异常的根本原因，并尽早预防它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

自引率

0.00%

发文量