SPINE: a scalable log parser with feedback guidance

Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang
{"title":"SPINE: a scalable log parser with feedback guidance","authors":"Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang","doi":"10.1145/3540250.3549176","DOIUrl":null,"url":null,"abstract":"Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SPINE:带有反馈指导的可伸缩日志解析器
日志解析提取日志模板和参数,是自动化日志分析技术的关键先决步骤。尽管现有的日志解析器已经在公共日志数据集上取得了很好的准确性,但在行业应用时仍然面临许多挑战。通过研究真实日志数据的特点和分析现有日志解析器的局限性,我们发现了两个问题。首先,将日志解析器扩展到大量日志是非常重要的,特别是在日志数据极度不平衡的实际场景中。其次,现有的日志解析器忽略了用户反馈的重要性,而用户反馈对于解析器在日志数据不断变化的情况下进行微调是必不可少的。为了克服这些挑战,我们提出了SPINE,它是一个具有用户反馈指导的高度可扩展的日志解析器。基于初始分组和渐进式聚类的日志解析器,提出了一种新的日志数据调度算法,以提高大规模不平衡日志数据下的并行化效率。此外,我们还引入了用户反馈,使解析器能够快速适应不断变化的日志。我们在16个公共日志数据集上评估了SPINE。SPINE平均解析精度超过0.90,解析效率最高,优于最先进的日志解析器。我们还在Microsoft的生产环境中评估了SPINE,在该环境中,SPINE可以在16个执行器下在不到8分钟的时间内解析3000万个日志,实现了接近实时的性能。此外,我们的评估表明,在日志演化和适度的用户反馈下,SPINE可以始终保持良好的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
676
期刊最新文献
Improving Grading Outcomes in Software Engineering Projects Through Automated Contributions Summaries GRADESTYLE: GitHub-Integrated and Automated Assessment of Java Code Style Improving Assessment of Programming Pattern Knowledge through Code Editing and Revision Designing for Real People: Teaching Agility through User-Centric Service Design Using Focus to Personalise Learning and Feedback in Software Engineering Education
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1