Bahareh Afshinpour, Roland Groz, Massih-Reza Amini
{"title":"基于遥测的概念空间模型创建软件故障预测","authors":"Bahareh Afshinpour, Roland Groz, Massih-Reza Amini","doi":"10.1109/QRS57517.2022.00030","DOIUrl":null,"url":null,"abstract":"Telemetry data (e.g.: CPU and memory usage) is an essential source of information for a software system that projects the system’s health. Anomalies in telemetry data warn system administrators about an imminent failure or deterioration of service quality. However, input events to the system (such as service requests) are the cause of abnormal system behaviour and, thus, anomalous telemetry data. By observing input events, one might predict anomalies even before they appear in telemetry data, thus giving the system administrator even earlier warning before the failure. Finding a correlation between input events and anomalies in telemetry data is challenging in many cases. This paper proposes a machine learning approach to learn the causality correlation between input event sequences and telemetry data. To this aim, a Natural Language Processing(NLP) approach is employed to create a concept space model to distinguish between normal and abnormal test sequences. Based on a vectorized representation of each input sequence, the concept space indicates whether the sequence will cause a system failure. Since the meaning of fault is not established in system status Telemetry-based fault detection, the suggested technique first detects periods of time when a software system status encounters aberrant situations (Bug-Zones). An extensive study on a real-world database acquired by a telecommunication operator and an open-source microservice software demonstrates that our approach achieves 71% and 90% accuracy as a Bug-Zones predictor.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Telemetry-Based Software Failure Prediction by Concept-Space Model Creation\",\"authors\":\"Bahareh Afshinpour, Roland Groz, Massih-Reza Amini\",\"doi\":\"10.1109/QRS57517.2022.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Telemetry data (e.g.: CPU and memory usage) is an essential source of information for a software system that projects the system’s health. Anomalies in telemetry data warn system administrators about an imminent failure or deterioration of service quality. However, input events to the system (such as service requests) are the cause of abnormal system behaviour and, thus, anomalous telemetry data. By observing input events, one might predict anomalies even before they appear in telemetry data, thus giving the system administrator even earlier warning before the failure. Finding a correlation between input events and anomalies in telemetry data is challenging in many cases. This paper proposes a machine learning approach to learn the causality correlation between input event sequences and telemetry data. To this aim, a Natural Language Processing(NLP) approach is employed to create a concept space model to distinguish between normal and abnormal test sequences. Based on a vectorized representation of each input sequence, the concept space indicates whether the sequence will cause a system failure. Since the meaning of fault is not established in system status Telemetry-based fault detection, the suggested technique first detects periods of time when a software system status encounters aberrant situations (Bug-Zones). An extensive study on a real-world database acquired by a telecommunication operator and an open-source microservice software demonstrates that our approach achieves 71% and 90% accuracy as a Bug-Zones predictor.\",\"PeriodicalId\":143812,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS57517.2022.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Telemetry-Based Software Failure Prediction by Concept-Space Model Creation
Telemetry data (e.g.: CPU and memory usage) is an essential source of information for a software system that projects the system’s health. Anomalies in telemetry data warn system administrators about an imminent failure or deterioration of service quality. However, input events to the system (such as service requests) are the cause of abnormal system behaviour and, thus, anomalous telemetry data. By observing input events, one might predict anomalies even before they appear in telemetry data, thus giving the system administrator even earlier warning before the failure. Finding a correlation between input events and anomalies in telemetry data is challenging in many cases. This paper proposes a machine learning approach to learn the causality correlation between input event sequences and telemetry data. To this aim, a Natural Language Processing(NLP) approach is employed to create a concept space model to distinguish between normal and abnormal test sequences. Based on a vectorized representation of each input sequence, the concept space indicates whether the sequence will cause a system failure. Since the meaning of fault is not established in system status Telemetry-based fault detection, the suggested technique first detects periods of time when a software system status encounters aberrant situations (Bug-Zones). An extensive study on a real-world database acquired by a telecommunication operator and an open-source microservice software demonstrates that our approach achieves 71% and 90% accuracy as a Bug-Zones predictor.