An empirical evaluation of unsupervised event log abstraction techniques in process mining

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Systems Pub Date : 2023-11-25 DOI:10.1016/j.is.2023.102320

Greg Van Houdt , Massimiliano de Leoni , Niels Martin , Benoît Depaire

{"title":"An empirical evaluation of unsupervised event log abstraction techniques in process mining","authors":"Greg Van Houdt , Massimiliano de Leoni , Niels Martin , Benoît Depaire","doi":"10.1016/j.is.2023.102320","DOIUrl":null,"url":null,"abstract":"<div><p>These days, businesses keep track of more and more data in their information systems. Moreover, this data becomes more fine-grained than ever, tracking clicks and mutations in databases at the lowest level possible. Faced with such data, process discovery often struggles with producing comprehensible models, as they instead return spaghetti-like models. Such finely granulated models do not fit the business user’s mental model of the process under investigation. To tackle this, event log abstraction (ELA) techniques can transform the underlying event log to a higher granularity level. However, insights into the performance of these techniques are lacking in literature as results are only based on small-scale experiments and are often inconclusive. Against this background, this paper evaluates state-of-the-art abstraction techniques on 400 event logs. Results show that ELA sacrifices fitness for precision, but complexity reductions heavily depend on the ELA technique used. This study also illustrates the importance of a larger-scale experiment, as sub-sampling of results leads to contradictory conclusions.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102320"},"PeriodicalIF":3.0000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437923001564","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

These days, businesses keep track of more and more data in their information systems. Moreover, this data becomes more fine-grained than ever, tracking clicks and mutations in databases at the lowest level possible. Faced with such data, process discovery often struggles with producing comprehensible models, as they instead return spaghetti-like models. Such finely granulated models do not fit the business user’s mental model of the process under investigation. To tackle this, event log abstraction (ELA) techniques can transform the underlying event log to a higher granularity level. However, insights into the performance of these techniques are lacking in literature as results are only based on small-scale experiments and are often inconclusive. Against this background, this paper evaluates state-of-the-art abstraction techniques on 400 event logs. Results show that ELA sacrifices fitness for precision, but complexity reductions heavily depend on the ELA technique used. This study also illustrates the importance of a larger-scale experiment, as sub-sampling of results leads to contradictory conclusions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

过程挖掘中无监督事件日志抽象技术的经验评价

如今，企业在其信息系统中跟踪越来越多的数据。此外，这些数据变得比以往任何时候都更细粒度，可以在尽可能低的级别上跟踪数据库中的点击和变化。面对这样的数据，过程发现常常难以产生可理解的模型，因为它们返回的是类似意大利面的模型。这种精细粒度的模型不适合业务用户对所研究流程的心理模型。为了解决这个问题，事件日志抽象(ELA)技术可以将底层事件日志转换到更高的粒度级别。然而，文献中缺乏对这些技术性能的深入了解，因为结果仅基于小规模实验，而且往往不确定。在此背景下，本文对400个事件日志的最新抽象技术进行了评估。结果表明，ELA为了精度牺牲了适应度，但复杂性的降低很大程度上取决于所使用的ELA技术。这项研究还说明了大规模实验的重要性，因为结果的子抽样会导致相互矛盾的结论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.