Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao
{"title":"A Taxonomy of Anomalies in Log Data","authors":"Thorsten Wittkopp, Philipp Wiesner, Dominik Scheinert, Odej Kao","doi":"arxiv-2111.13462","DOIUrl":null,"url":null,"abstract":"Log data anomaly detection is a core component in the area of artificial\nintelligence for IT operations. However, the large amount of existing methods\nmakes it hard to choose the right approach for a specific system. A better\nunderstanding of different kinds of anomalies, and which algorithms are\nsuitable for detecting them, would support researchers and IT operators.\nAlthough a common taxonomy for anomalies already exists, it has not yet been\napplied specifically to log data, pointing out the characteristics and\npeculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data\nanomalies and introduce a method for analyzing such anomalies in labeled\ndatasets. We applied our taxonomy to the three common benchmark datasets\nThunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised\nanomaly detection algorithms to evaluate their performance in detecting\ndifferent kinds of anomalies. Our results show, that the most common anomaly\ntype is also the easiest to predict. Moreover, deep learning-based approaches\noutperform data mining-based approaches in all anomaly types, but especially\nwhen it comes to detecting contextual anomalies.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2111.13462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Log data anomaly detection is a core component in the area of artificial
intelligence for IT operations. However, the large amount of existing methods
makes it hard to choose the right approach for a specific system. A better
understanding of different kinds of anomalies, and which algorithms are
suitable for detecting them, would support researchers and IT operators.
Although a common taxonomy for anomalies already exists, it has not yet been
applied specifically to log data, pointing out the characteristics and
peculiarities in this domain. In this paper, we present a taxonomy for different kinds of log data
anomalies and introduce a method for analyzing such anomalies in labeled
datasets. We applied our taxonomy to the three common benchmark datasets
Thunderbird, Spirit, and BGL, and trained five state-of-the-art unsupervised
anomaly detection algorithms to evaluate their performance in detecting
different kinds of anomalies. Our results show, that the most common anomaly
type is also the easiest to predict. Moreover, deep learning-based approaches
outperform data mining-based approaches in all anomaly types, but especially
when it comes to detecting contextual anomalies.