An Effective Approach for Parsing Large Log Files

Issam Sedki, A. Hamou-Lhadj, O. Mohamed, M. Shehab
{"title":"An Effective Approach for Parsing Large Log Files","authors":"Issam Sedki, A. Hamou-Lhadj, O. Mohamed, M. Shehab","doi":"10.1109/ICSME55016.2022.00009","DOIUrl":null,"url":null,"abstract":"Because of their contribution to the overall reliability assurance process, software logs have become important data assets for the analysis of software systems. Logs are often the only data points that can shed light on how a software system behaves once deployed. Unfortunately, logs are often unstructured data items, hindering viable analysis of their content. There are studies that aim to automatically parse large log files. The primary goal is to create templates from raw log data samples that can later be used to recognize future logs. In this paper, we propose ULP, a Unified Log Parsing tool, which is highly accurate and efficient. ULP combines string matching and local frequency analysis to parse large log files in an efficient manner. First, log events are organized into groups using a text processing method. Frequency analysis is then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 10 log datasets of the LogPai benchmark, ULP achieves an average accuracy of 89.2%, which outperforms the accuracy of four leading log parsing tools, namely Drain, Logram, SPELL and AEL. Additionally, ULP can parse up to four million log events in less than 3 minutes. ULP is available online as an open source and can be readily used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Because of their contribution to the overall reliability assurance process, software logs have become important data assets for the analysis of software systems. Logs are often the only data points that can shed light on how a software system behaves once deployed. Unfortunately, logs are often unstructured data items, hindering viable analysis of their content. There are studies that aim to automatically parse large log files. The primary goal is to create templates from raw log data samples that can later be used to recognize future logs. In this paper, we propose ULP, a Unified Log Parsing tool, which is highly accurate and efficient. ULP combines string matching and local frequency analysis to parse large log files in an efficient manner. First, log events are organized into groups using a text processing method. Frequency analysis is then applied locally to instances of the same group to identify static and dynamic content of log events. When applied to 10 log datasets of the LogPai benchmark, ULP achieves an average accuracy of 89.2%, which outperforms the accuracy of four leading log parsing tools, namely Drain, Logram, SPELL and AEL. Additionally, ULP can parse up to four million log events in less than 3 minutes. ULP is available online as an open source and can be readily used by practitioners and researchers to parse effectively and efficiently large log files so as to support log analysis tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
解析大型日志文件的有效方法
由于它们对整个可靠性保证过程的贡献,软件日志已成为软件系统分析的重要数据资产。日志通常是唯一能够揭示软件系统在部署后如何运行的数据点。不幸的是,日志通常是非结构化的数据项,阻碍了对其内容的可行分析。有一些研究旨在自动解析大型日志文件。主要目标是从原始日志数据示例创建模板,这些模板以后可用于识别未来的日志。在本文中,我们提出了一个统一的日志解析工具ULP,它具有很高的准确性和效率。ULP将字符串匹配和本地频率分析相结合,以高效的方式解析大型日志文件。首先,使用文本处理方法将日志事件组织成组。然后将频率分析本地应用于同一组的实例,以识别日志事件的静态和动态内容。当应用于LogPai基准的10个日志数据集时,ULP的平均准确率达到89.2%,优于四种领先的日志解析工具(Drain, Logram, SPELL和AEL)的准确率。此外,ULP可以在不到3分钟的时间内解析多达400万个日志事件。ULP作为开放源代码在线提供,从业者和研究人员可以很容易地使用它来有效和高效地解析大型日志文件,从而支持日志分析任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs COBREX: A Tool for Extracting Business Rules from COBOL On the Security of Python Virtual Machines: An Empirical Study The Phantom Menace: Unmasking Security Issues in Evolving Software Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1