Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction

IF 1.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Computer Science and Technology Pub Date : 2023-09-30 DOI:10.1007/s11390-021-1691-3

Ran Tian, Zu-Long Diao, Hai-Yang Jiang, Gao-Gang Xie

{"title":"Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction","authors":"Ran Tian, Zu-Long Diao, Hai-Yang Jiang, Gao-Gang Xie","doi":"10.1007/s11390-021-1691-3","DOIUrl":null,"url":null,"abstract":"<p>Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"9 4 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11390-021-1691-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

认知:使用模板校正进行准确一致的线性日志解析

日志包含系统和用户的运行时信息。由于其中许多使用自然语言，因此典型的基于日志的分析需要首先将日志解析为结构化格式。现有的解析方法通常分为两个步骤。第一步是找到相似的单词(标记)或句子。其次，解析器通过用可变占位符替换不同的令牌来提取日志模板。然而，我们观察到大多数解析器集中于精确分组相似的令牌或日志。但是他们没有一个设计良好的模板提取过程，这导致在特定数据集上的准确性不一致。根本原因是变量占位符和类似模板的定义不明确。其结果包括滥用可变占位符、模板划分不正确以及随着时间的推移模板数量过多。在本文中，我们提出了我们的在线日志解析方法Cognition。它通过严格的下限重新定义变量占位符，以避免歧义。然后，应用模板校正技术对相似模板进行合并和吸收。它消除了常用参数的干扰，从而隔离了模板数量。通过16个公共数据集的评估表明，认知比最先进的方法具有更好的准确性和一致性。与其他方法相比，平均节省52.1%的时间成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Computer Science and Technology 工程技术-计算机：软件工程

CiteScore

4.00

自引率

0.00%

发文量

2255

审稿时长

9.8 months

期刊介绍： Journal of Computer Science and Technology (JCST), the first English language journal in the computer field published in China, is an international forum for scientists and engineers involved in all aspects of computer science and technology to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the journal are selected through rigorous peer review, to ensure originality, timeliness, relevance, and readability. While the journal emphasizes the publication of previously unpublished materials, selected conference papers with exceptional merit that require wider exposure are, at the discretion of the editors, also published, provided they meet the journal''s peer review standards. The journal also seeks clearly written survey and review articles from experts in the field, to promote insightful understanding of the state-of-the-art and technology trends. Topics covered by Journal of Computer Science and Technology include but are not limited to: -Computer Architecture and Systems -Artificial Intelligence and Pattern Recognition -Computer Networks and Distributed Computing -Computer Graphics and Multimedia -Software Systems -Data Management and Data Mining -Theory and Algorithms -Emerging Areas