Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI:10.1109/ICSE43902.2021.00130

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang

{"title":"Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation","authors":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00130","DOIUrl":null,"url":null,"abstract":"With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

Abstract

With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于概率标签估计的半监督日志异常检测

随着软件系统的发展，日志已经成为辅助系统维护的重要数据。基于日志的异常检测是其中最重要的方法之一，它旨在通过日志分析自动检测系统异常。然而，现有的基于日志的异常检测方法仍然存在一些实际问题，要么依赖于大量人工标记的训练数据(监督方法)，要么没有学习历史异常的知识(无监督和半监督方法)，性能不理想。在本文中，我们提出了一种新的实用的基于日志的异常检测方法——PLELog，该方法是半监督的，以摆脱耗时的人工标记，并通过概率标签估计结合历史异常的知识，以发挥监督方法的优势。此外，PLELog能够通过语义嵌入对不稳定的日志数据保持免疫，并通过设计基于注意力的GRU神经网络高效地检测异常。我们在两个最广泛使用的公共数据集上对PLELog进行了评估，结果证明了PLELog的有效性，在f1得分方面平均提高了181.6%。特别地，PLELog应用于我校和一家大型企业的两个实际系统，进一步证明了它的实用性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

MuDelta: Delta-Oriented Mutation Testing at Commit Time Verifying Determinism in Sequential Programs Data-Oriented Differential Testing of Object-Relational Mapping Systems IoT Bugs and Development Challenges Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem