首页 > 最新文献

Information Systems最新文献

英文 中文
Business process simulation: Probabilistic modeling of intermittent resource availability and multitasking behavior 业务流程模拟:间歇性资源可用性和多任务行为的概率建模
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-09 DOI: 10.1016/j.is.2024.102471
Orlenys López-Pintado, Marlon Dumas
In business process simulation, resource availability is typically modeled by assigning a calendar to each resource, e.g., Monday–Friday, 9:00–18:00. Resources are assumed to be always available during each time slot in their availability calendar. This assumption often becomes invalid due to interruptions, breaks, or time-sharing across processes. In other words, existing approaches fail to capture intermittent availability. Another limitation of existing approaches is that they either do not consider multitasking behavior, or if they do, they assume that resources always multitask (up to a maximum capacity) whenever available. However, studies have shown that the multitasking patterns vary across days. This paper introduces a probabilistic approach to model resource availability and multitasking behavior for business process simulation. In this approach, each time slot in a resource calendar has an associated availability probability and a multitasking probability per multitasking level. For example, a resource may be available on Fridays between 14:00–15:00 with 90% probability, and given that they are performing one task during this slot, they may take on a second concurrent task with 60% probability. We propose algorithms to discover probabilistic calendars and probabilistic multitasking capacities from event logs. An evaluation shows that, with these enhancements, simulation models discovered from event logs better replicate the distribution of activities and cycle times, relative to approaches with crisp calendars and monotasking assumptions.
在业务流程模拟中,资源可用性通常是通过为每个资源分配一个日历来建模的,例如,周一至周五,9:00-18:00。假设资源在其可用性日历中的每个时间段内始终可用。由于中断、休息或跨流程分时,这一假设往往变得无效。换句话说,现有方法无法捕捉间歇性可用性。现有方法的另一个局限性在于,它们要么不考虑多任务处理行为,要么即使考虑了,也会假设资源在可用时总是多任务处理(达到最大容量)。然而,研究表明,多任务模式在不同的日子会有所不同。本文介绍了一种概率方法,用于为业务流程模拟中的资源可用性和多任务行为建模。在这种方法中,资源日历中的每个时间段都有相关的可用性概率和每个多任务级别的多任务概率。例如,资源在周五 14:00-15:00 之间可用的概率为 90%,考虑到他们在此时间段内正在执行一项任务,他们可能会以 60% 的概率同时执行第二项任务。我们提出了从事件日志中发现概率日历和概率多任务容量的算法。评估结果表明,与采用清晰日历和单任务假设的方法相比,通过这些增强功能从事件日志中发现的仿真模型能更好地复制活动和周期时间的分布。
{"title":"Business process simulation: Probabilistic modeling of intermittent resource availability and multitasking behavior","authors":"Orlenys López-Pintado,&nbsp;Marlon Dumas","doi":"10.1016/j.is.2024.102471","DOIUrl":"10.1016/j.is.2024.102471","url":null,"abstract":"<div><div>In business process simulation, resource availability is typically modeled by assigning a calendar to each resource, e.g., Monday–Friday, 9:00–18:00. Resources are assumed to be always available during each time slot in their availability calendar. This assumption often becomes invalid due to interruptions, breaks, or time-sharing across processes. In other words, existing approaches fail to capture intermittent availability. Another limitation of existing approaches is that they either do not consider multitasking behavior, or if they do, they assume that resources always multitask (up to a maximum capacity) whenever available. However, studies have shown that the multitasking patterns vary across days. This paper introduces a probabilistic approach to model resource availability and multitasking behavior for business process simulation. In this approach, each time slot in a resource calendar has an associated availability probability and a multitasking probability per multitasking level. For example, a resource may be available on Fridays between 14:00–15:00 with 90% probability, and given that they are performing one task during this slot, they may take on a second concurrent task with 60% probability. We propose algorithms to discover probabilistic calendars and probabilistic multitasking capacities from event logs. An evaluation shows that, with these enhancements, simulation models discovered from event logs better replicate the distribution of activities and cycle times, relative to approaches with crisp calendars and monotasking assumptions.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102471"},"PeriodicalIF":3.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling higher-order social influence using multi-head graph attention autoencoder 利用多头图注意力自动编码器建立高阶社会影响力模型
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-05 DOI: 10.1016/j.is.2024.102474
Elnaz Meydani , Christoph Duesing , Matthias Trier
Recommender systems are powerful tools developed to mitigate information overload in e-commerce platforms. Social recommender systems leverage social relations among users to predict their preferences. Recently, graph neural networks have been utilized for social recommendations, modeling user-user social relations and user–item interactions as graph-structured data. Despite their improvement over traditional systems, most existing social recommender systems exploit only first-order social relations and overlook the importance of social influence diffusion from higher-order neighbors in social networks. Additionally, these techniques often treat all neighboring nodes equally, without highlighting the most influential ones. To address these challenges, we introduce GATE-SR, a novel model that leverages a multi-head graph attention autoencoder to capture indirect social influence from higher-order neighbors while emphasizing the most relevant users. Moreover, we incorporate implicit social connections derived from coherent communities within the network. While GATE-SR performs comparably to baseline models in rich data environments, its strength lies in excelling at cold-start scenarios—where other models often fall short. This focus on cold-start performance aligns with our goal of building a robust recommender system for real-world challenges. Through extensive experiments on three real-world datasets, we demonstrate that GATE-SR outperforms several state-of-the-art baselines in cold-start scenarios. These results highlight the crucial role of accentuating the most influential neighbors, both explicit and implicit, when modeling higher-order social connections for more accurate recommendations.
推荐系统是为减轻电子商务平台信息过载而开发的强大工具。社交推荐系统利用用户之间的社交关系来预测他们的偏好。最近,图神经网络被用于社交推荐,将用户与用户之间的社交关系和用户与物品之间的互动建模为图结构数据。尽管与传统系统相比有了改进,但大多数现有的社交推荐系统只利用了一阶社交关系,而忽视了社交网络中来自高阶邻居的社交影响扩散的重要性。此外,这些技术往往对所有邻接节点一视同仁,而没有突出最有影响力的节点。为了应对这些挑战,我们引入了 GATE-SR,这是一种新型模型,它利用多头图注意力自动编码器捕捉来自高阶邻居的间接社会影响,同时强调最相关的用户。此外,我们还纳入了来自网络内连贯社区的隐式社交联系。虽然 GATE-SR 在丰富数据环境中的表现与基线模型不相上下,但它的优势在于在冷启动场景中表现出色--而其他模型往往在这种场景中表现不佳。对冷启动性能的关注与我们的目标一致,即为现实世界的挑战建立一个强大的推荐系统。通过在三个真实世界数据集上的广泛实验,我们证明了 GATE-SR 在冷启动场景中的表现优于几个最先进的基线模型。这些结果凸显了在为更准确的推荐建立高阶社交关系模型时,突出最有影响力的邻居(包括显性和隐性邻居)的关键作用。
{"title":"Modeling higher-order social influence using multi-head graph attention autoencoder","authors":"Elnaz Meydani ,&nbsp;Christoph Duesing ,&nbsp;Matthias Trier","doi":"10.1016/j.is.2024.102474","DOIUrl":"10.1016/j.is.2024.102474","url":null,"abstract":"<div><div>Recommender systems are powerful tools developed to mitigate information overload in e-commerce platforms. Social recommender systems leverage social relations among users to predict their preferences. Recently, graph neural networks have been utilized for social recommendations, modeling user-user social relations and user–item interactions as graph-structured data. Despite their improvement over traditional systems, most existing social recommender systems exploit only first-order social relations and overlook the importance of social influence diffusion from higher-order neighbors in social networks. Additionally, these techniques often treat all neighboring nodes equally, without highlighting the most influential ones. To address these challenges, we introduce GATE-SR, a novel model that leverages a multi-head graph attention autoencoder to capture indirect social influence from higher-order neighbors while emphasizing the most relevant users. Moreover, we incorporate implicit social connections derived from coherent communities within the network. While GATE-SR performs comparably to baseline models in rich data environments, its strength lies in excelling at cold-start scenarios—where other models often fall short. This focus on cold-start performance aligns with our goal of building a robust recommender system for real-world challenges. Through extensive experiments on three real-world datasets, we demonstrate that GATE-SR outperforms several state-of-the-art baselines in cold-start scenarios. These results highlight the crucial role of accentuating the most influential neighbors, both explicit and implicit, when modeling higher-order social connections for more accurate recommendations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102474"},"PeriodicalIF":3.0,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting explicit item–item correlations from knowledge graphs for enhanced sequential recommendation 利用知识图谱中明确的项目与项目之间的相关性来增强顺序推荐功能
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-05 DOI: 10.1016/j.is.2024.102470
Yanlin Zhang , Yuchen Shi , Deqing Yang , Xiaodong Gu
In recent years, the research of employing knowledge graphs (KGs) in sequential recommendation (SR) has received a lot of attention, since the side information extracted from KGs, especially the information of the correlations between items, indeed helps the SR models achieve better performance. However, many previous KG-based SR models tend to introduce some noise information when learning item embeddings, or insufficiently fuse item–item correlations into their sequential modeling, thus limiting their performance improvements. In this paper, we propose a Distance-Aware Knowledge-based Sequential Recommendation model (DAKSR), which exploits the explicit item–item correlations from KGs to achieve enhanced SR. Specifically, as one critical component in our DAKSR, the distance score matrix (DSM) is first obtained to indicate the correlations between items, and then leveraged in the following three major modules of DAKSR. First, in the Item-Set Embedding layer (ISE) all item embeddings are learned based on DSM, in which the noise information is eliminated effectively. Meanwhile, the Knowledge-Infused Transformer (KIT) incorporates DSM into its attention mechanism to improve the feature extraction. Furthermore, the Knowledge Contrastive Learning module (KCL) also leverages the item–item correlations presented in DSM to generate two credible sequence views, which are used to refine sample representations through a contrastive learning strategy, and thus improve the model’s robustness. Our extensive experiments on three SR benchmarks obviously demonstrate our DAKSR’s superior performance over the state-of-the-art (SOTA) KG-based recommendation models. The implementation of our DAKSR is available at https://github.com/Easonsi/DAKSR for reproducing our experiment results conveniently.
近年来,知识图谱(KG)在序列推荐(SR)中的应用研究受到了广泛关注,因为从知识图谱中提取的侧信息,尤其是条目间的相关性信息,确实有助于序列推荐模型获得更好的性能。然而,以往许多基于 KG 的 SR 模型在学习条目嵌入时往往会引入一些噪声信息,或者在建立序列模型时没有充分融合条目与条目之间的相关性,从而限制了其性能的提高。在本文中,我们提出了一种距离感知的基于知识的序列推荐模型(DAKSR),该模型利用基于项目嵌入的显式项目-项目相关性来实现增强的序列推荐。具体来说,作为我们的 DAKSR 的一个重要组成部分,距离得分矩阵(DSM)首先用来表示项目之间的相关性,然后在 DAKSR 的以下三个主要模块中加以利用。首先,在项目集嵌入层(ISE)中,所有项目嵌入都是基于 DSM 学习的,其中有效地消除了噪声信息。同时,知识注入转换器(KIT)将 DSM 纳入其注意机制,以改进特征提取。此外,知识对比学习模块(KCL)还利用 DSM 中的项目-项目相关性生成两个可信的序列视图,通过对比学习策略来完善样本表示,从而提高模型的鲁棒性。我们在三个推荐基准上进行的大量实验清楚地证明了我们的 DAKSR 比基于 KG 的最先进(SOTA)推荐模型具有更优越的性能。我们的 DAKSR 的实现方法可在 https://github.com/Easonsi/DAKSR 上获取,以便于重现我们的实验结果。
{"title":"Exploiting explicit item–item correlations from knowledge graphs for enhanced sequential recommendation","authors":"Yanlin Zhang ,&nbsp;Yuchen Shi ,&nbsp;Deqing Yang ,&nbsp;Xiaodong Gu","doi":"10.1016/j.is.2024.102470","DOIUrl":"10.1016/j.is.2024.102470","url":null,"abstract":"<div><div>In recent years, the research of employing knowledge graphs (KGs) in sequential recommendation (SR) has received a lot of attention, since the side information extracted from KGs, especially the information of the correlations between items, indeed helps the SR models achieve better performance. However, many previous KG-based SR models tend to introduce some noise information when learning item embeddings, or insufficiently fuse item–item correlations into their sequential modeling, thus limiting their performance improvements. In this paper, we propose a <strong>D</strong>istance-<strong>A</strong>ware <strong>K</strong>nowledge-based <strong>S</strong>equential <strong>R</strong>ecommendation model (<strong>DAKSR</strong>), which exploits the explicit item–item correlations from KGs to achieve enhanced SR. Specifically, as one critical component in our DAKSR, the <em>distance score matrix</em> (DSM) is first obtained to indicate the correlations between items, and then leveraged in the following three major modules of DAKSR. First, in the Item-Set Embedding layer (ISE) all item embeddings are learned based on DSM, in which the noise information is eliminated effectively. Meanwhile, the Knowledge-Infused Transformer (KIT) incorporates DSM into its attention mechanism to improve the feature extraction. Furthermore, the Knowledge Contrastive Learning module (KCL) also leverages the item–item correlations presented in DSM to generate two credible sequence views, which are used to refine sample representations through a contrastive learning strategy, and thus improve the model’s robustness. Our extensive experiments on three SR benchmarks obviously demonstrate our DAKSR’s superior performance over the state-of-the-art (SOTA) KG-based recommendation models. The implementation of our DAKSR is available at <span><span>https://github.com/Easonsi/DAKSR</span><svg><path></path></svg></span> for reproducing our experiment results conveniently.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102470"},"PeriodicalIF":3.0,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances on data management systems 数据管理系统的进步
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-03 DOI: 10.1016/j.is.2024.102467
Ladjel Bellatreche, Marlon Dumas, Panagiotis Karras, Raimundas Matulevičius, Silvia Chiusano, Tania Cerquitelli, Robert Wrembel
{"title":"Advances on data management systems","authors":"Ladjel Bellatreche,&nbsp;Marlon Dumas,&nbsp;Panagiotis Karras,&nbsp;Raimundas Matulevičius,&nbsp;Silvia Chiusano,&nbsp;Tania Cerquitelli,&nbsp;Robert Wrembel","doi":"10.1016/j.is.2024.102467","DOIUrl":"10.1016/j.is.2024.102467","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102467"},"PeriodicalIF":3.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142444592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special Issue of CAiSE 2023 Best Papers CAiSE 2023 最佳论文特刊
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1016/j.is.2024.102469
Iris Reinhartz-Berger , Marta Indulska
{"title":"Special Issue of CAiSE 2023 Best Papers","authors":"Iris Reinhartz-Berger ,&nbsp;Marta Indulska","doi":"10.1016/j.is.2024.102469","DOIUrl":"10.1016/j.is.2024.102469","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102469"},"PeriodicalIF":3.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142444620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding meaningful paths in heterogeneous graphs with PathWays 利用 PathWays 在异构图中查找有意义的路径
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-30 DOI: 10.1016/j.is.2024.102463
Nelly Barret , Antoine Gauquier , Jia-Jean Law , Ioana Manolescu
Graphs, and notably RDF graphs, are a prominent way of sharing data. As data usage democratizes, users need help figuring out the useful content of a graph dataset. In particular, journalists with whom we collaborate are interested in identifying, in a graph, the connections between entities, e.g., people, organizations, emails, etc. We present a novel method for exploring data graphs through their data paths connecting Named Entities (NEs, in short); each data path leads to a tabular-looking set of results. NEs are extracted from the data through dedicated Information Extraction modules. Our method builds upon the pre-existing ConnectionLens platform and follow-up work in the Abstra project, which builds simple, visual ER-style summaries of semi-structured data. The contribution of the present work, and its novelty, is twofold. First, we propose a novel analysis of entity-to-entity paths contained in datasets of any nature, and propose a new method for ranking paths, leveraging a novel Information Extraction (IE) module we built on top of ChatGPT. Second, we present an efficient approach to enumerate and compute NE paths, based on an algorithm which automatically recommends sub-paths to materialize, and rewrites the path queries using these subpaths. Our experiments demonstrate the interest of NE paths and the efficiency of our method for computing and ranking them.
图形,尤其是 RDF 图形,是一种重要的数据共享方式。随着数据使用的民主化,用户需要有人帮助他们找出图表数据集的有用内容。特别是与我们合作的记者,他们对在图中识别实体(如人、组织、电子邮件等)之间的联系很感兴趣。我们提出了一种通过连接命名实体(Named Entities,简称 NEs)的数据路径来探索数据图的新方法;每条数据路径都会产生一组表格形式的结果。通过专用的信息提取模块从数据中提取 NE。我们的方法建立在已有的 ConnectionLens 平台和 Abstra 项目的后续工作基础之上,后者可为半结构化数据建立简单、可视化的 ER 风格摘要。本工作的贡献及其新颖性体现在两个方面。首先,我们对任何性质的数据集中包含的实体到实体路径提出了一种新的分析方法,并利用我们在 ChatGPT 基础上构建的新颖信息提取(IE)模块,提出了一种新的路径排序方法。其次,我们提出了一种枚举和计算近义词路径的高效方法,该方法基于一种自动推荐子路径并使用这些子路径重写路径查询的算法。我们的实验证明了近邻路径的重要性以及我们计算和排列近邻路径的方法的效率。
{"title":"Finding meaningful paths in heterogeneous graphs with PathWays","authors":"Nelly Barret ,&nbsp;Antoine Gauquier ,&nbsp;Jia-Jean Law ,&nbsp;Ioana Manolescu","doi":"10.1016/j.is.2024.102463","DOIUrl":"10.1016/j.is.2024.102463","url":null,"abstract":"<div><div>Graphs, and notably RDF graphs, are a prominent way of sharing data. As data usage democratizes, users need help figuring out the useful content of a graph dataset. In particular, journalists with whom we collaborate are interested in identifying, in a graph, the <em>connections between entities</em>, e.g., people, organizations, emails, etc. We present a novel method for exploring data graphs through <em>their data paths connecting Named Entities</em> (NEs, in short); each data path leads to a tabular-looking set of results. NEs are extracted from the data through dedicated Information Extraction modules. Our method builds upon the pre-existing ConnectionLens platform and follow-up work in the Abstra project, which builds simple, visual ER-style summaries of semi-structured data. The contribution of the present work, and its novelty, is twofold. First, we propose a novel analysis of entity-to-entity paths contained in datasets of any nature, and propose a new method for ranking paths, leveraging a novel Information Extraction (IE) module we built on top of ChatGPT. Second, we present an efficient approach to enumerate and compute NE paths, based on an algorithm which automatically recommends sub-paths to materialize, and rewrites the path queries using these subpaths. Our experiments demonstrate the interest of NE paths and the efficiency of our method for computing and ranking them.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102463"},"PeriodicalIF":3.0,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142420464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using AI explainable models and handwriting/drawing tasks for psychological well-being 利用人工智能可解释模型和手写/绘画任务促进心理健康
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-28 DOI: 10.1016/j.is.2024.102465
Francesco Prinzi , Pietro Barbiero , Claudia Greco , Terry Amorese , Gennaro Cordasco , Pietro Liò , Salvatore Vitabile , Anna Esposito
This study addresses the increasing threat to Psychological Well-Being (PWB) posed by Depression, Anxiety, and Stress conditions. Machine learning methods have shown promising results for several psychological conditions. However, the lack of transparency in existing models impedes practical application. The study aims to develop explainable machine learning models for depression, anxiety and stress prediction, focusing on features extracted from tasks involving handwriting and drawing.
Two hundred patients completed the Depression, Anxiety, and Stress Scale (DASS-21) and performed seven tasks related to handwriting and drawing. Extracted features, encompassing pressure, stroke pattern, time, space, and pen inclination, were used to train the explainable-by-design Entropy-based Logic Explained Network (e-LEN) model, employing first-order logic rules for explanation. Performance comparison was performed with XGBoost, enhanced by the SHAP explanation method.
The trained models achieved notable accuracy in predicting depression (0.749 ±0.089), anxiety (0.721 ±0.088), and stress (0.761 ±0.086) through 10-fold cross-validation (repeated 20 times). The e-LEN model’s logic rules facilitated clinical validation, uncovering correlations with existing clinical literature. While performance remained consistent for depression and anxiety on an independent test dataset, a slight degradation was observed for stress prediction in the test task.
本研究探讨了抑郁、焦虑和压力对心理健康(PWB)造成的日益严重的威胁。机器学习方法已在几种心理状况方面取得了可喜的成果。然而,现有模型缺乏透明度,妨碍了实际应用。这项研究旨在开发用于预测抑郁、焦虑和压力的可解释机器学习模型,重点是从涉及手写和绘画的任务中提取的特征。200 名患者完成了抑郁、焦虑和压力量表(DASS-21),并完成了七项与手写和绘画有关的任务。提取的特征包括压力、笔画模式、时间、空间和笔的倾斜度,用于训练基于熵的可解释逻辑解释网络(e-LEN)模型,该模型采用一阶逻辑规则进行解释。通过 10 倍交叉验证(重复 20 次),训练出的模型在预测抑郁(0.749 ±0.089 )、焦虑(0.721 ±0.088 )和压力(0.761 ±0.086 )方面取得了显著的准确性。e-LEN 模型的逻辑规则促进了临床验证,发现了与现有临床文献的相关性。在独立的测试数据集上,抑郁和焦虑的表现保持一致,但在测试任务中,压力预测的表现略有下降。
{"title":"Using AI explainable models and handwriting/drawing tasks for psychological well-being","authors":"Francesco Prinzi ,&nbsp;Pietro Barbiero ,&nbsp;Claudia Greco ,&nbsp;Terry Amorese ,&nbsp;Gennaro Cordasco ,&nbsp;Pietro Liò ,&nbsp;Salvatore Vitabile ,&nbsp;Anna Esposito","doi":"10.1016/j.is.2024.102465","DOIUrl":"10.1016/j.is.2024.102465","url":null,"abstract":"<div><div>This study addresses the increasing threat to Psychological Well-Being (PWB) posed by Depression, Anxiety, and Stress conditions. Machine learning methods have shown promising results for several psychological conditions. However, the lack of transparency in existing models impedes practical application. The study aims to develop explainable machine learning models for depression, anxiety and stress prediction, focusing on features extracted from tasks involving handwriting and drawing.</div><div>Two hundred patients completed the Depression, Anxiety, and Stress Scale (DASS-21) and performed seven tasks related to handwriting and drawing. Extracted features, encompassing pressure, stroke pattern, time, space, and pen inclination, were used to train the explainable-by-design Entropy-based Logic Explained Network (e-LEN) model, employing first-order logic rules for explanation. Performance comparison was performed with XGBoost, enhanced by the SHAP explanation method.</div><div>The trained models achieved notable accuracy in predicting depression (0.749 ±0.089), anxiety (0.721 ±0.088), and stress (0.761 ±0.086) through 10-fold cross-validation (repeated 20 times). The e-LEN model’s logic rules facilitated clinical validation, uncovering correlations with existing clinical literature. While performance remained consistent for depression and anxiety on an independent test dataset, a slight degradation was observed for stress prediction in the test task.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102465"},"PeriodicalIF":3.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142420465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective data exploration through clustering of local attributive explanations 通过对局部归因解释的聚类进行有效的数据探索
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-28 DOI: 10.1016/j.is.2024.102464
Elodie Escriva , Tom Lefrere , Manon Martin , Julien Aligon , Alexandre Chanson , Jean-Baptiste Excoffier , Nicolas Labroche , Chantal Soulé-Dupuy , Paul Monsarrat
Machine Learning (ML) has become an essential tool for modeling complex phenomena, offering robust predictions and comprehensive data analysis. Nevertheless, the lack of interpretability in these predictions often results in a closed-box effect, which the field of eXplainable Machine Learning (XML) aims to address. Local attributive XML methods, in particular, provide explanations by quantifying the contribution of each attribute to individual predictions, referred to as influences. This type of explanation is the most acute as it focuses on each instance of the dataset and allows the detection of individual differences. Additionally, aggregating local explanations allows for a deeper analysis of the underlying data. In this context, influences can be considered as a new data space to reveal and understand complex data patterns. We hypothesize that these influences, derived from ML explanations, are more informative than the original raw data, especially for identifying homogeneous groups within the data. To identify such groups effectively, we utilize a clustering approach. We compare clusters formed using raw data against those formed using influences computed by various local attributive XML methods. Our findings reveal that clusters based on influences consistently outperform those based on raw data, even when using models with low accuracy.
机器学习(ML)已成为复杂现象建模的重要工具,可提供可靠的预测和全面的数据分析。然而,由于这些预测缺乏可解释性,往往会产生闭箱效应,而可解释机器学习(XML)领域正是要解决这一问题。局部属性 XML 方法尤其通过量化每个属性对单个预测的贡献(称为影响)来提供解释。这种类型的解释最为尖锐,因为它侧重于数据集的每个实例,并允许检测个体差异。此外,汇总局部解释可以对基础数据进行更深入的分析。在这种情况下,影响因素可被视为一种新的数据空间,用于揭示和理解复杂的数据模式。我们假设,这些从 ML 解释中得出的影响因素比原始数据更有参考价值,尤其是在识别数据中的同质群体方面。为了有效识别这类群体,我们采用了聚类方法。我们将使用原始数据形成的聚类与使用各种局部归因 XML 方法计算的影响因素形成的聚类进行了比较。我们的研究结果表明,基于影响因素的聚类始终优于基于原始数据的聚类,即使在使用准确率较低的模型时也是如此。
{"title":"Effective data exploration through clustering of local attributive explanations","authors":"Elodie Escriva ,&nbsp;Tom Lefrere ,&nbsp;Manon Martin ,&nbsp;Julien Aligon ,&nbsp;Alexandre Chanson ,&nbsp;Jean-Baptiste Excoffier ,&nbsp;Nicolas Labroche ,&nbsp;Chantal Soulé-Dupuy ,&nbsp;Paul Monsarrat","doi":"10.1016/j.is.2024.102464","DOIUrl":"10.1016/j.is.2024.102464","url":null,"abstract":"<div><div>Machine Learning (ML) has become an essential tool for modeling complex phenomena, offering robust predictions and comprehensive data analysis. Nevertheless, the lack of interpretability in these predictions often results in a closed-box effect, which the field of eXplainable Machine Learning (XML) aims to address. Local attributive XML methods, in particular, provide explanations by quantifying the contribution of each attribute to individual predictions, referred to as influences. This type of explanation is the most acute as it focuses on each instance of the dataset and allows the detection of individual differences. Additionally, aggregating local explanations allows for a deeper analysis of the underlying data. In this context, influences can be considered as a new data space to reveal and understand complex data patterns. We hypothesize that these influences, derived from ML explanations, are more informative than the original raw data, especially for identifying homogeneous groups within the data. To identify such groups effectively, we utilize a clustering approach. We compare clusters formed using raw data against those formed using influences computed by various local attributive XML methods. Our findings reveal that clusters based on influences consistently outperform those based on raw data, even when using models with low accuracy.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102464"},"PeriodicalIF":3.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Lakehouse: A survey and experimental study 数据湖:调查与实验研究
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-26 DOI: 10.1016/j.is.2024.102460
Ahmed A. Harby , Farhana Zulkernine
Efficient big data management is a dire necessity to manage the exponential growth in data generated by digital information systems to produce usable knowledge. Structured databases, data lakes, and warehouses have each provided a solution with varying degrees of success. However, a new and superior solution, the data Lakehouse, has emerged to extract actionable insights from unstructured data ingested from distributed sources. By combining the strengths of data warehouses and data lakes, the data Lakehouse can process and merge data quickly while ingesting and storing high-speed unstructured data with post-storage transformation and analytics capabilities. The Lakehouse architecture offers the necessary features for optimal functionality and has gained significant attention in the big data management research community. In this paper, we compare data lake, warehouse, and lakehouse systems, highlight their strengths and shortcomings, identify the desired features to handle the evolving challenges in big data management and analysis and propose an advanced data Lakehouse architecture. We also demonstrate the performance of three state-of-the-art data management systems namely HDFS data lake, Hive data warehouse, and Delta lakehouse in managing data for analytical query responses through an experimental study.
高效的大数据管理是管理数字信息系统产生的指数级增长数据以产生可用知识的迫切需要。结构化数据库、数据湖和仓库都提供了不同程度的解决方案。然而,一种新的、更优越的解决方案--数据湖,已经出现,它可以从从分布式来源获取的非结构化数据中提取可操作的见解。通过结合数据仓库和数据湖的优势,数据湖可以快速处理和合并数据,同时利用存储后转换和分析功能摄取和存储高速非结构化数据。Lakehouse 架构提供了实现最佳功能的必要特性,在大数据管理研究界获得了极大关注。在本文中,我们比较了数据湖、仓库和 Lakehouse 系统,强调了它们的优势和不足,确定了应对大数据管理和分析中不断变化的挑战所需的功能,并提出了一种先进的数据 Lakehouse 架构。我们还通过一项实验研究,展示了三种最先进的数据管理系统(即 HDFS 数据湖、Hive 数据仓库和 Delta Lakehouse)在管理数据以进行分析查询响应方面的性能。
{"title":"Data Lakehouse: A survey and experimental study","authors":"Ahmed A. Harby ,&nbsp;Farhana Zulkernine","doi":"10.1016/j.is.2024.102460","DOIUrl":"10.1016/j.is.2024.102460","url":null,"abstract":"<div><div>Efficient big data management is a dire necessity to manage the exponential growth in data generated by digital information systems to produce usable knowledge. Structured databases, data lakes, and warehouses have each provided a solution with varying degrees of success. However, a new and superior solution, the data Lakehouse, has emerged to extract actionable insights from unstructured data ingested from distributed sources. By combining the strengths of data warehouses and data lakes, the data Lakehouse can process and merge data quickly while ingesting and storing high-speed unstructured data with post-storage transformation and analytics capabilities. The Lakehouse architecture offers the necessary features for optimal functionality and has gained significant attention in the big data management research community. In this paper, we compare data lake, warehouse, and lakehouse systems, highlight their strengths and shortcomings, identify the desired features to handle the evolving challenges in big data management and analysis and propose an advanced data Lakehouse architecture. We also demonstrate the performance of three state-of-the-art data management systems namely HDFS data lake, Hive data warehouse, and Delta lakehouse in managing data for analytical query responses through an experimental study.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102460"},"PeriodicalIF":3.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive conformance checking: An approach for predicting deviations in business processes 主动一致性检查:预测业务流程偏差的方法
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-23 DOI: 10.1016/j.is.2024.102461
Michael Grohs , Peter Pfeiffer , Jana-Rebecca Rehse
Modern business processes are subject to an increasing number of external and internal regulations. Compliance with these regulations is crucial for the success of organizations. To ensure this compliance, process managers can identify and mitigate deviations between the predefined process behavior and the executed process instances by means of conformance checking techniques. However, these techniques are inherently reactive, meaning that they can only detect deviations after they have occurred. It would be desirable to detect and mitigate deviations before they occur, enabling managers to proactively ensure compliance of running process instances. In this paper, we propose Business Process Deviation Prediction (BPDP), a novel predictive approach that relies on a supervised machine learning model to predict which deviations can be expected in the future of running process instances. BPDP is able to predict individual deviations as well as deviation patterns. Further, it provides the user with a list of potential reasons for predicted deviations. Our evaluation shows that BPDP outperforms existing methods for deviation prediction. Following the idea of action-oriented process mining, BPDP thus enables process managers to prevent deviations in early stages of running process instances.
现代业务流程受制于越来越多的外部和内部法规。遵守这些规定对企业的成功至关重要。为确保这种合规性,流程管理者可以通过一致性检查技术来识别和减少预定义流程行为与已执行流程实例之间的偏差。然而,这些技术本质上是被动的,也就是说,它们只能在偏差发生后才能检测到偏差。我们希望在偏差发生之前就能发现并减少偏差,从而使管理人员能够主动确保运行中的流程实例符合要求。在本文中,我们提出了业务流程偏差预测(BPDP),这是一种新颖的预测方法,它依靠有监督的机器学习模型来预测运行流程实例未来可能出现的偏差。BPDP 既能预测单个偏差,也能预测偏差模式。此外,它还能为用户提供预测偏差的潜在原因列表。我们的评估结果表明,BPDP 在偏差预测方面优于现有方法。因此,按照面向行动的流程挖掘理念,BPDP 能够让流程管理者在流程实例运行的早期阶段就防止出现偏差。
{"title":"Proactive conformance checking: An approach for predicting deviations in business processes","authors":"Michael Grohs ,&nbsp;Peter Pfeiffer ,&nbsp;Jana-Rebecca Rehse","doi":"10.1016/j.is.2024.102461","DOIUrl":"10.1016/j.is.2024.102461","url":null,"abstract":"<div><div>Modern business processes are subject to an increasing number of external and internal regulations. Compliance with these regulations is crucial for the success of organizations. To ensure this compliance, process managers can identify and mitigate deviations between the predefined process behavior and the executed process instances by means of conformance checking techniques. However, these techniques are inherently reactive, meaning that they can only detect deviations after they have occurred. It would be desirable to detect and mitigate deviations before they occur, enabling managers to proactively ensure compliance of running process instances. In this paper, we propose Business Process Deviation Prediction (BPDP), a novel predictive approach that relies on a supervised machine learning model to predict which deviations can be expected in the future of running process instances. BPDP is able to predict individual deviations as well as deviation patterns. Further, it provides the user with a list of potential reasons for predicted deviations. Our evaluation shows that BPDP outperforms existing methods for deviation prediction. Following the idea of action-oriented process mining, BPDP thus enables process managers to prevent deviations in early stages of running process instances.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102461"},"PeriodicalIF":3.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142420466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1