AML：有效评估日志解析技术的准确度度量模型

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Systems and Software Pub Date : 2024-07-06 DOI:10.1016/j.jss.2024.112154

Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait Mohamed

{"title":"AML：有效评估日志解析技术的准确度度量模型","authors":"Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait Mohamed","doi":"10.1016/j.jss.2024.112154","DOIUrl":null,"url":null,"abstract":"<div><p>Logs are essential for the maintenance of large software systems. Software engineers often analyze logs for debugging, root cause analysis, and anomaly detection tasks. Logs, however, are partly structured, making the extraction of useful information from massive log files a challenging task. Recently, many log parsing techniques have been proposed to automatically extract log templates from unstructured log files. These parsers, however, are evaluated using different accuracy metrics. In this paper, we show that these metrics have several drawbacks, making it challenging to understand the strengths and limitations of existing parsers. To address this, we propose a novel accuracy metric, called AML (Accuracy Metric for Log Parsing). AML is a robust accuracy metric that is inspired by research in the field of remote sensing. It is based on measuring omission and commission errors. We use AML to assess the accuracy of 14 log parsing tools applied to the parsing of 16 log datasets. We also show how AML compares to existing accuracy metrics. Our findings demonstrate that AML is a promising accuracy metric for log parsing compared to alternative solutions, which enables a comprehensive evaluation of log parsing tools to help better decision-making in selecting and improving log parsing techniques.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"216 ","pages":"Article 112154"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AML: An accuracy metric model for effective evaluation of log parsing techniques\",\"authors\":\"Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait Mohamed\",\"doi\":\"10.1016/j.jss.2024.112154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Logs are essential for the maintenance of large software systems. Software engineers often analyze logs for debugging, root cause analysis, and anomaly detection tasks. Logs, however, are partly structured, making the extraction of useful information from massive log files a challenging task. Recently, many log parsing techniques have been proposed to automatically extract log templates from unstructured log files. These parsers, however, are evaluated using different accuracy metrics. In this paper, we show that these metrics have several drawbacks, making it challenging to understand the strengths and limitations of existing parsers. To address this, we propose a novel accuracy metric, called AML (Accuracy Metric for Log Parsing). AML is a robust accuracy metric that is inspired by research in the field of remote sensing. It is based on measuring omission and commission errors. We use AML to assess the accuracy of 14 log parsing tools applied to the parsing of 16 log datasets. We also show how AML compares to existing accuracy metrics. Our findings demonstrate that AML is a promising accuracy metric for log parsing compared to alternative solutions, which enables a comprehensive evaluation of log parsing tools to help better decision-making in selecting and improving log parsing techniques.</p></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"216 \",\"pages\":\"Article 112154\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224001997\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224001997","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

日志对于大型软件系统的维护至关重要。软件工程师经常分析日志，以完成调试、根本原因分析和异常检测任务。然而，日志是部分结构化的，因此从海量日志文件中提取有用信息是一项极具挑战性的任务。最近，人们提出了许多日志解析技术，用于从非结构化日志文件中自动提取日志模板。然而，这些解析器使用不同的准确度指标进行评估。在本文中，我们发现这些指标存在一些缺陷，使得了解现有解析器的优势和局限性变得十分困难。为了解决这个问题，我们提出了一种新的准确度指标，称为 AML（日志解析的准确度指标）。AML 是一种稳健的准确度度量，其灵感来自遥感领域的研究。它以测量遗漏和误差为基础。我们使用 AML 来评估 14 种日志解析工具解析 16 个日志数据集的准确性。我们还展示了 AML 与现有准确度指标的比较。我们的研究结果表明，与其他解决方案相比，AML 是一种很有前途的日志解析准确度指标，它可以对日志解析工具进行全面评估，从而帮助在选择和改进日志解析技术时做出更好的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AML: An accuracy metric model for effective evaluation of log parsing techniques

Logs are essential for the maintenance of large software systems. Software engineers often analyze logs for debugging, root cause analysis, and anomaly detection tasks. Logs, however, are partly structured, making the extraction of useful information from massive log files a challenging task. Recently, many log parsing techniques have been proposed to automatically extract log templates from unstructured log files. These parsers, however, are evaluated using different accuracy metrics. In this paper, we show that these metrics have several drawbacks, making it challenging to understand the strengths and limitations of existing parsers. To address this, we propose a novel accuracy metric, called AML (Accuracy Metric for Log Parsing). AML is a robust accuracy metric that is inspired by research in the field of remote sensing. It is based on measuring omission and commission errors. We use AML to assess the accuracy of 14 log parsing tools applied to the parsing of 16 log datasets. We also show how AML compares to existing accuracy metrics. Our findings demonstrate that AML is a promising accuracy metric for log parsing compared to alternative solutions, which enables a comprehensive evaluation of log parsing tools to help better decision-making in selecting and improving log parsing techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.

期刊最新文献

Harnessing heap analysis for the synthesis of superoptimized bytecode A bot identification model and tool based on GitHub activity sequences Editorial Board OSCAR-P and aMLLibrary: Profiling and predicting the performance of FaaS-based applications in computing continua Integrating neural mutation into mutation-based fault localization: A hybrid approach