Efficiently Cleaning Structured Event Logs: A Graph Repair Approach

IF 2.2 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Database Systems Pub Date : 2022-11-17 DOI:10.1145/3571281

Ruihong Huang, Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei

{"title":"Efficiently Cleaning Structured Event Logs: A Graph Repair Approach","authors":"Ruihong Huang, Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei","doi":"10.1145/3571281","DOIUrl":null,"url":null,"abstract":"Event data are often dirty owing to various recording conventions or simply system errors. These errors may cause serious damage to real applications, such as inaccurate provenance answers, poor profiling results, or concealing interesting patterns from event data. Cleaning dirty event data is strongly demanded. While existing event data cleaning techniques view event logs as sequences, structural information does exist among events, such as the task passing relationships between staffs in workflow or the invocation relationships among different micro-services in monitoring application performance. We argue that such structural information enhances not only the accuracy of repairing inconsistent events but also the computation efficiency. It is notable that both the structure and the names (labeling) of events could be inconsistent. In real applications, while an unsound structure is not repaired automatically (which requires manual effort from business actors to handle the structure error), it is highly desirable to repair the inconsistent event names introduced by recording mistakes. In this article, we first prove that the inconsistent label repairing problem is NP-complete. Then, we propose a graph repair approach for (1) detecting unsound structures, and (2) repairing inconsistent event names. Efficient pruning techniques together with two heuristic solutions are also presented. Extensive experiments over real and synthetic datasets demonstrate both the effectiveness and efficiency of our proposal.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"31 1","pages":"1 - 44"},"PeriodicalIF":2.2000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3571281","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Event data are often dirty owing to various recording conventions or simply system errors. These errors may cause serious damage to real applications, such as inaccurate provenance answers, poor profiling results, or concealing interesting patterns from event data. Cleaning dirty event data is strongly demanded. While existing event data cleaning techniques view event logs as sequences, structural information does exist among events, such as the task passing relationships between staffs in workflow or the invocation relationships among different micro-services in monitoring application performance. We argue that such structural information enhances not only the accuracy of repairing inconsistent events but also the computation efficiency. It is notable that both the structure and the names (labeling) of events could be inconsistent. In real applications, while an unsound structure is not repaired automatically (which requires manual effort from business actors to handle the structure error), it is highly desirable to repair the inconsistent event names introduced by recording mistakes. In this article, we first prove that the inconsistent label repairing problem is NP-complete. Then, we propose a graph repair approach for (1) detecting unsound structures, and (2) repairing inconsistent event names. Efficient pruning techniques together with two heuristic solutions are also presented. Extensive experiments over real and synthetic datasets demonstrate both the effectiveness and efficiency of our proposal.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有效地清理结构化事件日志:一种图修复方法

由于各种记录约定或简单的系统错误，事件数据通常是脏的。这些错误可能会对实际应用程序造成严重损害，例如不准确的来源答案、糟糕的分析结果，或者从事件数据中隐藏有趣的模式。强烈要求清理脏事件数据。虽然现有的事件数据清理技术将事件日志视为序列，但事件之间确实存在结构化信息，例如工作流中人员之间的任务传递关系，或者监视应用程序性能时不同微服务之间的调用关系。我们认为这种结构信息不仅提高了修复不一致事件的准确性，而且提高了计算效率。值得注意的是，事件的结构和名称(标记)可能不一致。在实际的应用程序中，虽然不会自动修复不健全的结构(这需要业务参与者手动处理结构错误)，但非常希望修复由记录错误引入的不一致的事件名称。在本文中，我们首先证明了不一致标签修复问题是np完全的。然后，我们提出了一种图修复方法，用于(1)检测不健全的结构，(2)修复不一致的事件名称。提出了有效的剪枝技术和两种启发式解决方案。在真实和合成数据集上进行的大量实验证明了我们的建议的有效性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.

期刊最新文献

Automated Category Tree Construction: Hardness Bounds and Algorithms Database Repairing with Soft Functional Dependencies Sharing Queries with Nonequivalent User-Defined Aggregate Functions A family of centrality measures for graph data based on subgraphs GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)