Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2023-12-02 DOI:10.1016/j.datak.2023.102253
Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers
{"title":"Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support","authors":"Rafael Gaspar de Sousa ,&nbsp;Antonio Carlos Meira Neto ,&nbsp;Marcelo Fantinato ,&nbsp;Sarajane Marques Peres ,&nbsp;Hajo Alexander Reijers","doi":"10.1016/j.datak.2023.102253","DOIUrl":null,"url":null,"abstract":"<div><p><span>Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as </span><em>concept drifts</em><span><span><span><span> – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex </span>data structures<span> related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and </span></span>localization methods. Each cluster abstracts a behavior profile underlying the process and reveals </span>descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102253"},"PeriodicalIF":2.7000,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23001131","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as concept drifts – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex data structures related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and localization methods. Each cluster abstracts a behavior profile underlying the process and reveals descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于批和流轨迹聚类支持的过程挖掘中概念漂移的集成检测与定位
流程挖掘可以通过从事件日志中提取知识来帮助组织。然而,流程挖掘技术通常假设业务流程是固定的,而实际的业务流程由于组织及其外部环境的复杂性而不断变化。因此,处理随时间变化的过程变化——称为概念漂移——允许更好地理解过程行为,并且可以为组织提供竞争优势,特别是在在线数据流场景中。目前处理过程概念漂移的方法主要集中于检测和定位概念漂移,通常是通过一种集成的(尽管是离线的)方法。然而,这些集成方法的一部分依赖于与基于树的过程模型相关的复杂数据结构,通常通过算法发现,其结果受特定启发式规则的影响。此外,大多数提出的方法尚未在通常用作基准的公开真实概念漂移标记事件日志上进行测试,这使得比较分析变得困难。在本文中,我们提出了一种在线方法,利用批处理和流跟踪聚类支持,以集成的方式检测和定位概念漂移。在我们的方法中,聚类模型为概念漂移检测和定位方法提供输入信息。每个聚类都抽象出一个过程底层的行为概况,并揭示关于发现的概念漂移的描述性信息。对具有不同控制流变化的基准合成事件日志以及实际事件日志进行的实验表明,当依赖于相同的聚类模型时,我们的方法与基线概念漂移检测方法相比具有竞争力。此外,实验表明,我们的方法能够正确定位检测到的概念漂移,并允许通过不同的过程行为配置文件分析这种概念漂移。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
期刊最新文献
Goal modelling in aeronautics: Practical applications for aircraft and manufacturing designs Ethical reasoning methods for ICT: What they are and when to use them SSQTKG: A Subgraph-based Semantic Query Approach for Temporal Knowledge Graph NoSQL document data migration strategy in the context of schema evolution VarClaMM: A reference meta-model to understand DNA variant classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1