预测软件系统运行时的性能异常

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-04-23 DOI:10.1145/3440757

Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, Toby Corbin

{"title":"预测软件系统运行时的性能异常","authors":"Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, Toby Corbin","doi":"10.1145/3440757","DOIUrl":null,"url":null,"abstract":"High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"25 1","pages":"1 - 33"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Predicting Performance Anomalies in Software Systems at Run-time\",\"authors\":\"Guoliang Zhao, Safwat Hassan, Ying Zou, Derek Truong, Toby Corbin\",\"doi\":\"10.1145/3440757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.\",\"PeriodicalId\":7398,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"volume\":\"25 1\",\"pages\":\"1 - 33\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3440757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

高性能是实现和维护软件系统成功的关键因素。性能异常表示软件系统在运行时的性能退化问题(例如，系统响应时间变慢)。性能异常会对用户满意度造成极大的负面影响。先前的研究提出了在异常发生后通过分析执行日志和资源利用指标来检测异常的不同方法。然而，先验检测方法无法提前预测异常;这种限制不可避免地导致采取纠正措施以防止性能异常发生的延迟。我们提出了一种预测软件系统性能异常并提前提出异常警告的方法。我们的方法使用长短期记忆神经网络来捕捉软件系统的正常行为。然后，我们的方法通过识别捕获的正常系统行为的早期偏差来预测性能异常。我们使用两个真实的软件系统(即Elasticsearch和Hadoop)进行了大量的实验来评估我们的方法。我们将方法的性能与两条基线进行比较。第一个基线是最先进的基线，称为无监督行为学习。第二个基线通过检查资源利用率是否超过预定义的阈值来预测性能异常。结果表明，该方法能够以较高的准确率(97-100%)和召回率(80-100%)预测各种性能异常，而基线的准确率为25-97%，召回率为93-100%。对于一系列性能异常，我们的方法可以实现从20到1403秒(即23.4分钟)不等的充足交货时间。我们还演示了我们的方法预测由实际性能错误引起的性能异常的能力。对于预测由实际性能错误引起的性能异常，我们的方法达到了95-100%的精度和87-100%的召回率，而基线达到了49-83%的精度和100%的召回率。结果表明，该方法优于现有的异常预测方法，能够预测实际系统中的性能异常。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting Performance Anomalies in Software Systems at Run-time

High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking