SPINE: a scalable log parser with feedback guidance

软件产业与工程 Pub Date : 2022-11-07 DOI:10.1145/3540250.3549176

Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang

{"title":"SPINE: a scalable log parser with feedback guidance","authors":"Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang","doi":"10.1145/3540250.3549176","DOIUrl":null,"url":null,"abstract":"Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SPINE:带有反馈指导的可伸缩日志解析器

日志解析提取日志模板和参数，是自动化日志分析技术的关键先决步骤。尽管现有的日志解析器已经在公共日志数据集上取得了很好的准确性，但在行业应用时仍然面临许多挑战。通过研究真实日志数据的特点和分析现有日志解析器的局限性，我们发现了两个问题。首先，将日志解析器扩展到大量日志是非常重要的，特别是在日志数据极度不平衡的实际场景中。其次，现有的日志解析器忽略了用户反馈的重要性，而用户反馈对于解析器在日志数据不断变化的情况下进行微调是必不可少的。为了克服这些挑战，我们提出了SPINE，它是一个具有用户反馈指导的高度可扩展的日志解析器。基于初始分组和渐进式聚类的日志解析器，提出了一种新的日志数据调度算法，以提高大规模不平衡日志数据下的并行化效率。此外，我们还引入了用户反馈，使解析器能够快速适应不断变化的日志。我们在16个公共日志数据集上评估了SPINE。SPINE平均解析精度超过0.90，解析效率最高，优于最先进的日志解析器。我们还在Microsoft的生产环境中评估了SPINE，在该环境中，SPINE可以在16个执行器下在不到8分钟的时间内解析3000万个日志，实现了接近实时的性能。此外，我们的评估表明，在日志演化和适度的用户反馈下，SPINE可以始终保持良好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

软件产业与工程

自引率

0.00%

发文量

676