Foundations and practice of binary process discovery

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Systems Pub Date : 2023-12-20 DOI:10.1016/j.is.2023.102339

Tijs Slaats , Søren Debois , Christoffer Olling Back , Axel Kjeld Fjelrad Christfort

{"title":"Foundations and practice of binary process discovery","authors":"Tijs Slaats , Søren Debois , Christoffer Olling Back , Axel Kjeld Fjelrad Christfort","doi":"10.1016/j.is.2023.102339","DOIUrl":null,"url":null,"abstract":"<div>Most contemporary process discovery methods take as inputs only positive examples of process executions, and so they are one-class classification algorithms. However, we have found negative examples to also be available in industry, hence we build on earlier work that treats process discovery as a binary classification problem. This approach opens the door to many well-established methods and metrics from machine learning, in particular to improve the distinction between what should and should not be allowed by the output model. Concretely, we (1) present a verified formalisation of process discovery as a binary classification problem; (2) provide cases with negative examples from industry, including real-life logs; (3) propose the Rejection Miner binary classification procedure, applicable to any process notation that has a suitable syntactic composition operator; (4) implement two concrete binary miners, one outputting Declare patterns, the other Dynamic Condition Response (DCR) graphs; and (5) apply these miners to real world and synthetic logs obtained from our industry partners and the process discovery contest, showing increased output model quality in terms of accuracy and model size.</div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102339"},"PeriodicalIF":3.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437923001758/pdfft?md5=f2bf1fcd001426b54f1d43f5ac2ad3d9&pid=1-s2.0-S0306437923001758-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437923001758","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Most contemporary process discovery methods take as inputs only positive examples of process executions, and so they are one-class classification algorithms. However, we have found negative examples to also be available in industry, hence we build on earlier work that treats process discovery as a binary classification problem. This approach opens the door to many well-established methods and metrics from machine learning, in particular to improve the distinction between what should and should not be allowed by the output model. Concretely, we (1) present a verified formalisation of process discovery as a binary classification problem; (2) provide cases with negative examples from industry, including real-life logs; (3) propose the Rejection Miner binary classification procedure, applicable to any process notation that has a suitable syntactic composition operator; (4) implement two concrete binary miners, one outputting Declare patterns, the other Dynamic Condition Response (DCR) graphs; and (5) apply these miners to real world and synthetic logs obtained from our industry partners and the process discovery contest, showing increased output model quality in terms of accuracy and model size.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

二进制过程发现的基础与实践

大多数当代流程发现方法仅将流程执行的正面示例作为输入，因此属于单类分类算法。然而，我们发现工业中也有负面示例，因此我们在早期工作的基础上，将流程发现视为二元分类问题。这种方法为机器学习中许多成熟的方法和指标打开了大门，特别是改进了输出模型应该允许和不应该允许的内容之间的区别。具体来说，我们（1）将流程发现形式化为一个二元分类问题，并进行了验证；（2）提供了来自行业的负面案例，包括现实生活中的日志；（3）提出了拒绝矿工二元分类程序，该程序适用于任何具有合适语法组成算子的流程符号；(4) 实现两个具体的二进制矿工，一个输出声明模式，另一个输出动态条件响应（DCR）图；以及 (5) 将这些矿工应用于从我们的行业合作伙伴和流程发现竞赛中获得的真实世界和合成日志，结果显示在准确性和模型大小方面输出模型的质量都有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.

期刊最新文献

Discovering partially ordered workflow models Learning policies for resource allocation in business processes STracker: A framework for identifying sentiment changes in customer feedbacks Two-level massive string dictionaries A generative and discriminative model for diversity-promoting recommendation