{"title":"DIP","authors":"Daniel Plaisted, Mengjun Xie","doi":"10.1145/3476883.3520226","DOIUrl":null,"url":null,"abstract":"Certain classes of log analytical models, such as those for log anomaly detection, require as inputs sequences of parsed log messages in which the message tokens that belong to the template of the message are indicated. For this reason, it is common for such a model to employ a log parser, a program that detects the template of each message in a log file. It has been shown that even the most accurate log parsers in the literature fail to achieve high accuracy at detecting the templates of messages from certain systems' log files. This paper presents DIP, a tree-based log parser. The primary methodological innovation of DIP lies in the mechanism it uses to determine whether pairs of very similar messages have the same template. While many existing parsers only consider the percentage of matching tokens between two similar messages in determining whether they have the same template, DIP considers in addition the actual tokens at which the two messages disagree, deeming a pair of similar messages to have the same template if and only if each of those tokens satisfies one in a certain set of three conditions. Our experimental results show that DIP can achieve an average accuracy that is superior to that obtained by each of the 13 parsers tested in a 2019 survey study on log parsers. Furthermore, we give evidence that it achieves this high accuracy without compromising in terms of runtime.","PeriodicalId":91384,"journal":{"name":"Proceedings of the 2014 ACM Southeast Regional Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM Southeast Regional Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3476883.3520226","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Certain classes of log analytical models, such as those for log anomaly detection, require as inputs sequences of parsed log messages in which the message tokens that belong to the template of the message are indicated. For this reason, it is common for such a model to employ a log parser, a program that detects the template of each message in a log file. It has been shown that even the most accurate log parsers in the literature fail to achieve high accuracy at detecting the templates of messages from certain systems' log files. This paper presents DIP, a tree-based log parser. The primary methodological innovation of DIP lies in the mechanism it uses to determine whether pairs of very similar messages have the same template. While many existing parsers only consider the percentage of matching tokens between two similar messages in determining whether they have the same template, DIP considers in addition the actual tokens at which the two messages disagree, deeming a pair of similar messages to have the same template if and only if each of those tokens satisfies one in a certain set of three conditions. Our experimental results show that DIP can achieve an average accuracy that is superior to that obtained by each of the 13 parsers tested in a 2019 survey study on log parsers. Furthermore, we give evidence that it achieves this high accuracy without compromising in terms of runtime.