Information Systems最新文献_第7页

Enhancing cross-market recommendations by addressing negative transfer and leveraging item co-occurrences 通过解决负面转移和利用项目共现，加强跨市场推荐

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-16 DOI: 10.1016/j.is.2024.102388

Zheng Hu , Satoshi Nakagawa , Shi-Min Cai , Fuji Ren , Jiawen Deng

Real-world multinational e-commerce companies, such as Amazon and eBay, serve in multiple countries and regions. Some markets are data-scarce, while others are data-rich. In recent years, cross-market recommendation (CMR) has been proposed to bolster data-scarce markets by leveraging auxiliary information from data-rich markets. Previous CMR algorithms have employed techniques such as sharing market-agnostic parameters or incorporating inter-market similarity to optimize the performance of CMR. However, the existing approaches have several limitations: (1) They do not fully utilize the valuable information on item co-occurrences obtained from data-rich markets (such as the consistent purchase of mice and keyboards). (2) They ignore the issue of negative transfer stemming from disparities across diverse markets. To address these limitations, we introduce a novel attention-based model that exploits users’ historical behaviors to mine general patterns from item co-occurrences and designs market-specific embeddings to mitigate negative transfer. Specifically, we propose an attention-based user interest mining module to harness the potential of common items as bridges for mining general knowledge from item co-occurrence patterns through rich data derived from global markets. In order to mitigate the adverse effects of negative transfer, we decouple the item representations into market-specific embeddings and market-agnostic embeddings. The market-specific embeddings effectively model the inherent biases associated with different markets, while the market-agnostic embeddings learn generic representations of the items. Extensive experiments conducted on seven real-world datasets illustrate our model’s effectiveness.¹ Our model outperforms the suboptimal model by an average of 4.82%, 6.82%, 3.87%, and 5.34% across four variants of two metrics. Extensive experiments and analysis demonstrate the effectiveness of our proposed model in mining general item co-occurrence patterns and avoiding negative transfer for data-sparse markets.

现实世界中的跨国电子商务公司，如亚马逊和 eBay，在多个国家和地区提供服务。一些市场数据稀缺，而另一些市场数据丰富。近年来，有人提出了跨市场推荐（CMR），通过利用数据丰富市场的辅助信息来支持数据稀缺市场。以往的跨市场推荐算法采用了共享市场无关参数或结合市场间相似性等技术来优化跨市场推荐的性能。然而，现有方法有几个局限性：(1) 它们没有充分利用从数据丰富的市场（如鼠标和键盘的一致购买）中获得的物品共现的宝贵信息。(2) 它们忽视了不同市场间差异所产生的负迁移问题。为了解决这些局限性，我们引入了一种新颖的基于注意力的模型，该模型利用用户的历史行为从项目共现中挖掘一般模式，并设计针对特定市场的嵌入来减轻负迁移。具体来说，我们提出了一个基于注意力的用户兴趣挖掘模块，利用共同项目作为桥梁的潜力，通过从全球市场获得的丰富数据，从项目共现模式中挖掘一般知识。为了减轻负迁移的不利影响，我们将项目表征解耦为特定市场嵌入和市场无关嵌入。针对特定市场的嵌入有效地模拟了与不同市场相关的固有偏差，而与市场无关的嵌入则学习了项目的通用表征。在七个真实世界数据集上进行的大量实验证明了我们模型的有效性1。在两个指标的四个变体中，我们的模型平均优于次优模型 4.82%、6.82%、3.87% 和 5.34%。广泛的实验和分析证明了我们提出的模型在挖掘一般项目共现模式和避免数据稀缺市场的负转移方面的有效性。

{"title":"Enhancing cross-market recommendations by addressing negative transfer and leveraging item co-occurrences","authors":"Zheng Hu , Satoshi Nakagawa , Shi-Min Cai , Fuji Ren , Jiawen Deng","doi":"10.1016/j.is.2024.102388","DOIUrl":"10.1016/j.is.2024.102388","url":null,"abstract":"<div><p>Real-world multinational e-commerce companies, such as Amazon and eBay, serve in multiple countries and regions. Some markets are data-scarce, while others are data-rich. In recent years, cross-market recommendation (CMR) has been proposed to bolster data-scarce markets by leveraging auxiliary information from data-rich markets. Previous CMR algorithms have employed techniques such as sharing market-agnostic parameters or incorporating inter-market similarity to optimize the performance of CMR. However, the existing approaches have several limitations: (1) They do not fully utilize the valuable information on item co-occurrences obtained from data-rich markets (such as the consistent purchase of mice and keyboards). (2) They ignore the issue of negative transfer stemming from disparities across diverse markets. To address these limitations, we introduce a novel attention-based model that exploits users’ historical behaviors to mine general patterns from item co-occurrences and designs market-specific embeddings to mitigate negative transfer. Specifically, we propose an attention-based user interest mining module to harness the potential of common items as bridges for mining general knowledge from item co-occurrence patterns through rich data derived from global markets. In order to mitigate the adverse effects of negative transfer, we decouple the item representations into market-specific embeddings and market-agnostic embeddings. The market-specific embeddings effectively model the inherent biases associated with different markets, while the market-agnostic embeddings learn generic representations of the items. Extensive experiments conducted on seven real-world datasets illustrate our model’s effectiveness.<span><sup>1</sup></span> Our model outperforms the suboptimal model by an average of 4.82%, 6.82%, 3.87%, and 5.34% across four variants of two metrics. Extensive experiments and analysis demonstrate the effectiveness of our proposed model in mining general item co-occurrence patterns and avoiding negative transfer for data-sparse markets.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102388"},"PeriodicalIF":3.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140788460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-13 DOI: 10.1016/j.is.2024.102386

Luka Abb, Jana-Rebecca Rehse

User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in such a log represents an interaction between the user and the interface, such as clicking a button, ticking a checkbox, or typing into a text field. UI logs are used in many different application contexts for purposes such as usability analysis, task mining, or robotic process automation (RPA). However, UI logs suffer from a lack of standardization. Each research study and processing tool relies on a different conceptualization and implementation of the elements and attributes of user interactions. This exacerbates or even prohibits the integration of UI logs from different sources or the combination of UI data collection tools with downstream analytics or automation solutions. In this paper, our objective is to address this issue and facilitate the exchange and analysis of UI logs in research and practice. Therefore, we first review process-related UI logs in scientific publications and industry tools to determine commonalities and differences between them. Based on our findings, we propose a universally applicable reference data model for process-related UI logs, which includes all core attributes but remains flexible regarding the scope, level of abstraction, and case notion. Finally, we provide exemplary implementations of the reference model in XES and OCED.

用户交互（UI）日志是一种高分辨率的事件日志，记录了用户在信息系统中执行任务时的低级活动。此类日志中的每个事件都代表了用户与界面之间的交互，例如点击按钮、勾选复选框或在文本字段中输入内容。用户界面日志可用于许多不同的应用场合，如可用性分析、任务挖掘或机器人流程自动化（RPA）。然而，用户界面日志缺乏标准化。每项研究和处理工具都依赖于对用户交互元素和属性的不同概念化和实现。这加剧甚至阻碍了不同来源用户界面日志的整合，或用户界面数据收集工具与下游分析或自动化解决方案的结合。在本文中，我们的目标是解决这一问题，促进用户界面日志在研究和实践中的交流和分析。因此，我们首先回顾了科学出版物和行业工具中与流程相关的用户界面日志，以确定它们之间的共性和差异。在此基础上，我们为流程相关用户界面日志提出了一个普遍适用的参考数据模型，其中包括所有核心属性，但在范围、抽象程度和案例概念方面保持灵活性。最后，我们提供了参考模型在 XES 和 OCED 中的示例实现。

{"title":"Process-related user interaction logs: State of the art, reference model, and object-centric implementation","authors":"Luka Abb, Jana-Rebecca Rehse","doi":"10.1016/j.is.2024.102386","DOIUrl":"https://doi.org/10.1016/j.is.2024.102386","url":null,"abstract":"<div><p>User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in such a log represents an interaction between the user and the interface, such as clicking a button, ticking a checkbox, or typing into a text field. UI logs are used in many different application contexts for purposes such as usability analysis, task mining, or robotic process automation (RPA). However, UI logs suffer from a lack of standardization. Each research study and processing tool relies on a different conceptualization and implementation of the elements and attributes of user interactions. This exacerbates or even prohibits the integration of UI logs from different sources or the combination of UI data collection tools with downstream analytics or automation solutions. In this paper, our objective is to address this issue and facilitate the exchange and analysis of UI logs in research and practice. Therefore, we first review process-related UI logs in scientific publications and industry tools to determine commonalities and differences between them. Based on our findings, we propose a universally applicable reference data model for process-related UI logs, which includes all core attributes but remains flexible regarding the scope, level of abstraction, and case notion. Finally, we provide exemplary implementations of the reference model in XES and OCED.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102386"},"PeriodicalIF":3.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000449/pdfft?md5=99fcafdb33deb5f863a548bbb4740fc9&pid=1-s2.0-S0306437924000449-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140604560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads 安全读取快照适用于混合 OLTP/OLAP 工作负载上只读事务的中止/免等待可序列化读取方法

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-09 DOI: 10.1016/j.is.2024.102385

Takamitsu Shioi , Takashi Kambayashi , Suguru Arakawa , Ryoji Kurosawa , Satoshi Hikida , Haruo Yokota

This paper proposes Read-Safe Snapshots (RSS), a concurrency control method that ensures reading the latest serializable version on multiversion concurrency control (MVCC) for read-only transactions without creating any serializability anomaly, thereby enhancing the transaction processing throughput under mixed workloads of online transactional processing (OLTP) and online analytical processing (OLAP). Ensuring serializability for data consistency between OLTP and OLAP is vital to prevent OLAP from obtaining nonserializable results. Existing serializability methods achieve this consistency by making OLTP or OLAP transactions aborts or waits, but these can lead to throughput degradation when implemented for large read sets in read-only OLAP transactions under mixed workloads of the recent real-time analysis applications. To deal with this problem, we present an RSS construction algorithm that does not affect the conventional OLTP performance and simultaneously avoids producing additional aborts and waits. Moreover, the RSS construction method can be easily applied to the read-only replica of a multinode system as well as a single-node system because no validation for serializability is required. Our experimental findings showed that RSS could prevent read-only OLAP transactions from creating anomaly cycles under a multinode environment of master-copy replication, which led to the achievement of serializability with the low overhead of about 15% compared to baseline OLTP/OLAP throughputs under snapshot isolation (SI). The OLTP throughput under our proposed method in a mixed OLTP/OLAP workload was about 45% better than SafeSnapshots, a serializable snapshot isolation (SSI) equipped with a read-only optimization method, and did not degrade the OLAP throughput.

本文提出了一种并发控制方法--"读安全快照"（Read-Safe Snapshots，RSS），它能确保在多版本并发控制（Multi-iversion concurrency control，MVCC）上读取只读事务的最新可序列化版本，而不会产生任何可序列化异常，从而提高在线事务处理（OLTP）和在线分析处理（OLAP）混合工作负载下的事务处理吞吐量。确保 OLTP 和 OLAP 之间数据一致性的可序列化对于防止 OLAP 获得不可序列化的结果至关重要。现有的可序列化方法通过让 OLTP 或 OLAP 事务中止或等待来实现这种一致性，但在最近的实时分析应用的混合工作负载下，当在只读 OLAP 事务中实施大型读取集时，这些方法会导致吞吐量下降。为了解决这个问题，我们提出了一种 RSS 构建算法，它不会影响传统 OLTP 性能，同时还能避免产生额外的中止和等待。此外，由于不需要序列化验证，RSS 构建方法既可以轻松应用于多节点系统的只读副本，也可以应用于单节点系统。我们的实验结果表明，RSS 可以防止只读 OLAP 事务在主副本复制的多节点环境下产生异常循环，从而实现可序列化，与快照隔离（SI）下的基准 OLTP/OLAP 吞吐量相比，开销仅为 15%。在混合 OLTP/OLAP 工作负载中，采用我们提出的方法的 OLTP 吞吐量比配备了只读优化方法的可序列化快照隔离（SSI）SafeSnapshots 高出约 45%，而且不会降低 OLAP 吞吐量。

{"title":"Read-safe snapshots: An abort/wait-free serializable read method for read-only transactions on mixed OLTP/OLAP workloads","authors":"Takamitsu Shioi , Takashi Kambayashi , Suguru Arakawa , Ryoji Kurosawa , Satoshi Hikida , Haruo Yokota","doi":"10.1016/j.is.2024.102385","DOIUrl":"https://doi.org/10.1016/j.is.2024.102385","url":null,"abstract":"<div><p>This paper proposes Read-Safe Snapshots (RSS), a concurrency control method that ensures reading the latest serializable version on multiversion concurrency control (MVCC) for read-only transactions without creating any serializability anomaly, thereby enhancing the transaction processing throughput under mixed workloads of online transactional processing (OLTP) and online analytical processing (OLAP). Ensuring serializability for data consistency between OLTP and OLAP is vital to prevent OLAP from obtaining nonserializable results. Existing serializability methods achieve this consistency by making OLTP or OLAP transactions aborts or waits, but these can lead to throughput degradation when implemented for large read sets in read-only OLAP transactions under mixed workloads of the recent real-time analysis applications. To deal with this problem, we present an RSS construction algorithm that does not affect the conventional OLTP performance and simultaneously avoids producing additional aborts and waits. Moreover, the RSS construction method can be easily applied to the read-only replica of a multinode system as well as a single-node system because no validation for serializability is required. Our experimental findings showed that RSS could prevent read-only OLAP transactions from creating anomaly cycles under a multinode environment of master-copy replication, which led to the achievement of serializability with the low overhead of about 15% compared to baseline OLTP/OLAP throughputs under snapshot isolation (SI). The OLTP throughput under our proposed method in a mixed OLTP/OLAP workload was about 45% better than SafeSnapshots, a serializable snapshot isolation (SSI) equipped with a read-only optimization method, and did not degrade the OLAP throughput.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102385"},"PeriodicalIF":3.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000437/pdfft?md5=44919a1e7ab150e46eaabe4c385782e7&pid=1-s2.0-S0306437924000437-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Blockchain technology for requirement traceability in systems engineering 区块链技术在系统工程中的需求可追溯性

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-05 DOI: 10.1016/j.is.2024.102384

Mohan S.R. Elapolu , Rahul Rai , David J. Gorsich , Denise Rizzo , Stephen Rapp , Matthew P. Castanier

Requirement engineering (RE), a systematic process of eliciting, defining, analyzing, and managing requirements, is a vital phase in systems engineering. In RE, requirement traceability establishes the relationship between the artifacts and supports requirement validation, change management, and impact analysis. Establishing requirement traceability is challenging, especially in the early stages of a complex system design, as requirements constantly evolve and change. Moreover, the involvement of distributed stakeholders in system development introduces collaboration and trust issues. This paper outlines a novel blockchain-based requirement traceability framework that includes a data acquisition template and graph-based visualization. The template enables dual-level traceability (artifact and object) in the RE processes. The traceability information acquired through the templates is stored in the blockchain, where traces are embedded in blocks’ metadata and data. Furthermore, the blockchain is represented as a Neo4J property graph where traces can be retrieved using Cypher queries, thus enabling a mechanism to query and examine the history of requirements. The framework’s efficacy is showcased by documenting the RE process of an autonomous automotive system. Our results indicated that the framework can record the history of artifacts with constantly changing requirements and can yield secure decentralized ledgers of requirement artifacts. The proposed distributed traceability framework has shown promise to enhance stakeholder collaboration and trust. However, additional user studies should be conducted to bolster our results.

需求工程（Requirement Engineering，RE）是一个系统性的需求征集、定义、分析和管理过程，是系统工程的一个重要阶段。在需求工程中，需求可追溯性建立了工件之间的关系，并支持需求验证、变更管理和影响分析。建立需求可追溯性具有挑战性，尤其是在复杂系统设计的早期阶段，因为需求会不断发展和变化。此外，分布式利益相关者参与系统开发会带来协作和信任问题。本文概述了一种基于区块链的新型需求可追溯性框架，其中包括数据采集模板和基于图形的可视化。该模板实现了可再生能源流程中的双层可追溯性（工件和对象）。通过模板获取的可追溯信息存储在区块链中，其中的痕迹嵌入到区块的元数据和数据中。此外，区块链被表示为一个 Neo4J 属性图，可使用 Cypher 查询检索痕迹，从而实现查询和检查需求历史的机制。通过记录自主汽车系统的 RE 流程，展示了该框架的功效。我们的研究结果表明，该框架可以记录需求不断变化的工件历史，并能产生安全的分散式需求工件分类账。所提出的分布式可追溯性框架有望增强利益相关者之间的协作和信任。不过，还需要进行更多的用户研究，以巩固我们的成果。

{"title":"Blockchain technology for requirement traceability in systems engineering","authors":"Mohan S.R. Elapolu , Rahul Rai , David J. Gorsich , Denise Rizzo , Stephen Rapp , Matthew P. Castanier","doi":"10.1016/j.is.2024.102384","DOIUrl":"https://doi.org/10.1016/j.is.2024.102384","url":null,"abstract":"<div><p>Requirement engineering (RE), a systematic process of eliciting, defining, analyzing, and managing requirements, is a vital phase in systems engineering. In RE, requirement traceability establishes the relationship between the artifacts and supports requirement validation, change management, and impact analysis. Establishing requirement traceability is challenging, especially in the early stages of a complex system design, as requirements constantly evolve and change. Moreover, the involvement of distributed stakeholders in system development introduces collaboration and trust issues. This paper outlines a novel blockchain-based requirement traceability framework that includes a data acquisition template and graph-based visualization. The template enables dual-level traceability (artifact and object) in the RE processes. The traceability information acquired through the templates is stored in the blockchain, where traces are embedded in blocks’ metadata and data. Furthermore, the blockchain is represented as a <em>Neo4J</em> property graph where traces can be retrieved using <em>Cypher</em> queries, thus enabling a mechanism to query and examine the history of requirements. The framework’s efficacy is showcased by documenting the RE process of an autonomous automotive system. Our results indicated that the framework can record the history of artifacts with constantly changing requirements and can yield secure decentralized ledgers of requirement artifacts. The proposed distributed traceability framework has shown promise to enhance stakeholder collaboration and trust. However, additional user studies should be conducted to bolster our results.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"123 ","pages":"Article 102384"},"PeriodicalIF":3.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140554947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enjoy the silence: Analysis of stochastic Petri nets with silent transitions 享受无声无声转换的随机 Petri 网分析

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-04 DOI: 10.1016/j.is.2024.102383

Sander J.J. Leemans , Fabrizio Maria Maggi , Marco Montali

Capturing stochastic behaviour in business and work processes is essential to quantitatively understand how nondeterminism is resolved when taking decisions within the process. This is of special interest in process mining, where event data tracking the actual execution of the process are related to process models, and can then provide insights on frequencies and probabilities. Variants of stochastic Petri nets provide a natural formal basis to represent stochastic behaviour and support different data-driven and model-driven analysis tasks in this spectrum. However, when capturing business processes, such nets inherently need a labelling that maps between transitions and activities. In many state of the art process mining techniques, this labelling is not 1-on-1, leading to unlabelled transitions and activities represented by multiple transitions. At the same time, they have to be analysed in a finite-trace semantics, matching the fact that each process execution consists of finitely many steps. These two aspects impede the direct application of existing techniques for stochastic Petri nets, calling for a novel characterisation that incorporates labels and silent transitions in a finite-trace semantics. In this article, we provide such a characterisation starting from generalised stochastic Petri nets and obtaining the framework of labelled stochastic processes (LSPs). On top of this framework, we introduce different key analysis tasks on the traces of LSPs and their probabilities. We show that all such analysis tasks can be solved analytically, in particular reducing them to a single method that combines automata-based techniques to single out the behaviour of interest within an LSP, with techniques based on absorbing Markov chains to reason on their probabilities. Finally, we demonstrate the significance of how our approach in the context of stochastic conformance checking, illustrating practical feasibility through a proof-of-concept implementation and its application to different datasets.

捕捉业务和工作流程中的随机行为对于定量了解流程内决策时如何解决非确定性问题至关重要。这在流程挖掘中具有特殊意义，因为在流程挖掘中，跟踪流程实际执行情况的事件数据与流程模型相关联，从而可以深入了解频率和概率。随机 Petri 网的变体为表示随机行为提供了一个自然的形式基础，并在此范围内支持不同的数据驱动和模型驱动分析任务。然而，在捕捉业务流程时，此类网络本质上需要在过渡和活动之间进行映射的标签。在许多最先进的流程挖掘技术中，这种标记不是一对一的，从而导致未标记的过渡和活动由多个过渡来表示。同时，它们必须以有限轨迹语义进行分析，这与每个流程执行由有限多个步骤组成的事实相匹配。这两个方面阻碍了现有随机 Petri 网技术的直接应用，因此需要一种新颖的表征方法，将标签和无声转换纳入有限轨迹语义。在本文中，我们从广义随机 Petri 网出发，提供了这样一种表征方法，并获得了标签随机过程（LSP）框架。在此框架之上，我们引入了关于 LSPs 轨迹及其概率的不同关键分析任务。我们证明，所有这些分析任务都可以通过分析来解决，特别是将它们简化为一种单一的方法，将基于自动机的技术与基于吸收马尔可夫链的技术相结合，前者用于找出 LSP 中感兴趣的行为，后者用于推理其概率。最后，我们展示了我们的方法在随机一致性检查中的意义，通过概念验证实施及其在不同数据集上的应用，说明了这种方法的实际可行性。

{"title":"Enjoy the silence: Analysis of stochastic Petri nets with silent transitions","authors":"Sander J.J. Leemans , Fabrizio Maria Maggi , Marco Montali","doi":"10.1016/j.is.2024.102383","DOIUrl":"10.1016/j.is.2024.102383","url":null,"abstract":"<div><p>Capturing stochastic behaviour in business and work processes is essential to quantitatively understand how nondeterminism is resolved when taking decisions within the process. This is of special interest in process mining, where event data tracking the actual execution of the process are related to process models, and can then provide insights on frequencies and probabilities. Variants of stochastic Petri nets provide a natural formal basis to represent stochastic behaviour and support different data-driven and model-driven analysis tasks in this spectrum. However, when capturing business processes, such nets inherently need a labelling that maps between transitions and activities. In many state of the art process mining techniques, this labelling is not 1-on-1, leading to unlabelled transitions and activities represented by multiple transitions. At the same time, they have to be analysed in a finite-trace semantics, matching the fact that each process execution consists of finitely many steps. These two aspects impede the direct application of existing techniques for stochastic Petri nets, calling for a novel characterisation that incorporates labels and silent transitions in a finite-trace semantics. In this article, we provide such a characterisation starting from generalised stochastic Petri nets and obtaining the framework of labelled stochastic processes (LSPs). On top of this framework, we introduce different key analysis tasks on the traces of LSPs and their probabilities. We show that all such analysis tasks can be solved analytically, in particular reducing them to a single method that combines automata-based techniques to single out the behaviour of interest within an LSP, with techniques based on absorbing Markov chains to reason on their probabilities. Finally, we demonstrate the significance of how our approach in the context of stochastic conformance checking, illustrating practical feasibility through a proof-of-concept implementation and its application to different datasets.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102383"},"PeriodicalIF":3.7,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000413/pdfft?md5=2011a29e04496e91e304834ecac1b098&pid=1-s2.0-S0306437924000413-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140762458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A chance for models to show their quality: Stochastic process model-log dimensions 模型展示其质量的机会随机过程模型-对数维度

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-04-02 DOI: 10.1016/j.is.2024.102382

Adam T. Burke , Sander J.J. Leemans , Moe T. Wynn , Wil M.P. van der Aalst , Arthur H.M. ter Hofstede

Process models describe the desired or observed behaviour of organisations. In stochastic process mining, computational analysis of trace data yields process models which describe process paths and their probability of execution. To understand the quality of these models, and to compare them, quantitative quality measures are used.

This research investigates model comparison empirically, using stochastic process models built from real-life logs. The experimental design collects a large number of models generated randomly and using process discovery techniques. Twenty-five different metrics are taken on these models, using both existing process model metrics and new, exploratory ones. The results are analysed quantitatively, making particular use of principal component analysis.

Based on this analysis, we suggest three stochastic process model dimensions: adhesion, relevance and simplicity. We also suggest possible metrics for these dimensions, and demonstrate their use on example models.

流程模型描述组织的预期或观察到的行为。在随机流程挖掘中，通过对跟踪数据进行计算分析，可以得到描述流程路径及其执行概率的流程模型。为了解这些模型的质量并对其进行比较，我们使用了定量质量度量方法。本研究使用从真实日志中建立的随机流程模型，对模型比较进行了实证研究。实验设计收集了大量使用流程发现技术随机生成的模型。使用现有的流程模型指标和新的探索性指标，对这些模型进行了 25 种不同指标的测量。在此基础上，我们提出了三个随机流程模型维度：粘性、相关性和简单性。我们还为这些维度提出了可能的度量标准，并在示例模型中进行了演示。

{"title":"A chance for models to show their quality: Stochastic process model-log dimensions","authors":"Adam T. Burke , Sander J.J. Leemans , Moe T. Wynn , Wil M.P. van der Aalst , Arthur H.M. ter Hofstede","doi":"10.1016/j.is.2024.102382","DOIUrl":"https://doi.org/10.1016/j.is.2024.102382","url":null,"abstract":"<div><p>Process models describe the desired or observed behaviour of organisations. In stochastic process mining, computational analysis of trace data yields process models which describe process paths and their probability of execution. To understand the quality of these models, and to compare them, quantitative quality measures are used.</p><p>This research investigates model comparison empirically, using stochastic process models built from real-life logs. The experimental design collects a large number of models generated randomly and using process discovery techniques. Twenty-five different metrics are taken on these models, using both existing process model metrics and new, exploratory ones. The results are analysed quantitatively, making particular use of principal component analysis.</p><p>Based on this analysis, we suggest three stochastic process model dimensions: adhesion, relevance and simplicity. We also suggest possible metrics for these dimensions, and demonstrate their use on example models.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102382"},"PeriodicalIF":3.7,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000401/pdfft?md5=6831ca8dc2e3712e67135ed5946d6e27&pid=1-s2.0-S0306437924000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The rise of nonnegative matrix factorization: Algorithms and applications 非负矩阵因式分解的兴起：算法与应用

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-03-21 DOI: 10.1016/j.is.2024.102379

Yi-Ting Guo , Qin-Qin Li , Chun-Sheng Liang

Although nonnegative matrix factorization (NMF) is widely used, some matrix factorization methods result in misleading results and waste of computing resources due to lack of timely optimization and case-by-case consideration. Therefore, an up-to-date and comprehensive review on its algorithms and applications is needed to promote improvement and applications for NMF. Here, we start with introducing background and gathering the principles and formulae of NMF algorithms. There have been dozens of new algorithms since its birth in the 1990s. Generally, several or even more algorithms are adopted in a single software package written in R, Python, C/C++, etc. Besides, the applications of NMF are analyzed. NMF is not only most widely used in modern subjects or techniques such as computer science, telecommunications, imaging science, and remote sensing but also increasingly used in traditional subjects such as physics, chemistry, biology, medicine, and psychology, being accepted by around 130 fields (disciplines) in about 20 years. Finally, the features and performance of different categories of NMF are summarized and evaluated. The summarized advantages and disadvantages and proposed suggestions for improvements are expected to enlighten the future efforts to polish the mathematical principles and procedures of NMF to realize higher accuracy and productivity in practical use.

尽管非负矩阵因式分解（NMF）得到了广泛应用，但一些矩阵因式分解方法由于缺乏及时优化和个案考虑，导致了误导性结果和计算资源的浪费。因此，需要对其算法和应用进行最新、全面的评述，以促进 NMF 的改进和应用。在此，我们首先介绍背景，并收集 NMF 算法的原理和公式。自 20 世纪 90 年代诞生以来，已有数十种新算法。一般来说，在一个用 R、Python、C/C++ 等语言编写的软件包中会采用几种甚至更多的算法。此外，还分析了 NMF 的应用。NMF 不仅在计算机科学、电信、成像科学和遥感等现代学科或技术中得到了最广泛的应用，而且在物理、化学、生物、医学和心理学等传统学科中也得到了越来越多的应用，在约 20 年的时间里被约 130 个领域（学科）所接受。最后，总结并评估了不同类别 NMF 的特点和性能。总结的优缺点和提出的改进建议，希望能对今后完善 NMF 的数学原理和程序，以实现更高精度和更高生产率的实际应用有所启发。

{"title":"The rise of nonnegative matrix factorization: Algorithms and applications","authors":"Yi-Ting Guo , Qin-Qin Li , Chun-Sheng Liang","doi":"10.1016/j.is.2024.102379","DOIUrl":"10.1016/j.is.2024.102379","url":null,"abstract":"<div><p>Although nonnegative matrix factorization (NMF) is widely used, some matrix factorization methods result in misleading results and waste of computing resources due to lack of timely optimization and case-by-case consideration. Therefore, an up-to-date and comprehensive review on its algorithms and applications is needed to promote improvement and applications for NMF. Here, we start with introducing background and gathering the principles and formulae of NMF algorithms. There have been dozens of new algorithms since its birth in the 1990s. Generally, several or even more algorithms are adopted in a single software package written in R, Python, C/C++, etc. Besides, the applications of NMF are analyzed. NMF is not only most widely used in modern subjects or techniques such as computer science, telecommunications, imaging science, and remote sensing but also increasingly used in traditional subjects such as physics, chemistry, biology, medicine, and psychology, being accepted by around 130 fields (disciplines) in about 20 years. Finally, the features and performance of different categories of NMF are summarized and evaluated. The summarized advantages and disadvantages and proposed suggestions for improvements are expected to enlighten the future efforts to polish the mathematical principles and procedures of NMF to realize higher accuracy and productivity in practical use.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"123 ","pages":"Article 102379"},"PeriodicalIF":3.7,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140276114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cube query interestingness: Novelty, relevance, peculiarity and surprise 立方体查询的趣味性：新颖性、相关性、特殊性和惊奇性

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-03-21 DOI: 10.1016/j.is.2024.102381

Dimos Gkitsakis , Spyridon Kaloudis , Eirini Mouselli , Veronika Peralta , Patrick Marcel , Panos Vassiliadis

In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We start with a comprehensive review of related work in the fields of human behavior studies and computer science. We define the interestingness of a query as a vector of scores along different aspects, like novelty, relevance, surprise and peculiarity and complement this definition with a taxonomy of the information that can be used to assess each of these aspects of interestingness. We provide both syntactic (result-independent) and extensional (result-dependent) checks, measures and algorithms for assessing the different aspects of interestingness in a quantitative fashion. We also report our findings from a user study that we conducted, analyzing the significance of each aspect, its evolution over time and the behavior of the study’s participants.

本文讨论了在数据立方体环境中评估查询趣味性的方法。我们假定有一个分层多维数据库，其中存储着数据立方体和层次结构。我们首先全面回顾了人类行为研究和计算机科学领域的相关工作。我们将查询的趣味性定义为不同方面的分数向量，如新颖性、相关性、惊奇性和特殊性，并用可用于评估趣味性各方面的信息分类法对这一定义进行补充。我们提供了句法（与结果无关）和扩展（与结果有关）检查、测量方法和算法，用于定量评估趣味性的不同方面。我们还报告了我们进行的一项用户研究的结果，分析了每个方面的重要性、其随时间的演变以及研究参与者的行为。

引用次数: 0

A graph neural network with topic relation heterogeneous multi-level cross-item information for session-based recommendation 基于会话推荐的具有主题关系异构多级跨项信息的图神经网络

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-03-20 DOI: 10.1016/j.is.2024.102380

Fan Yang, Dunlu Peng

The aim of session-based recommendation (SBR) mainly analyzes the anonymous user’s historical behavior records to predict the next possible interaction item and recommend the result to the user. However, due to the anonymity of users and the sparsity of behavior records, recommendation results are often inaccurate. The existing SBR models mainly consider the order of items within a session and rarely analyze the complex transition relationship between items, and additionally, they are inadequate at mining higher-order hidden relationship between different sessions. To address these issues, we propose a topic relation heterogeneous multi-level cross-item information graph neural network (TRHMCI-GNN) to improve the performance of recommendation. The model attempts to capture hidden relationship between items through topic classification and build a topic relation heterogeneous cross-item global graph. The graph contains inter-session cross-item information as well as hidden topic relation among sessions. In addition, a self-loop star graph is established to learn the intra-session cross-item information, and the self-connection attributes are added to fuse the information of each item itself. By using channel-hybrid attention mechanism, the item information of different levels is pooled by two channels: max-pooling and mean-pooling, which effectively fuse the item information of cross-item global graph and self-loop star graph. In this way, the model captures the global information of the target item and its individual features, and the label smoothing operation is added for recommendation. Extensive experimental results demonstrate that the recommendation performance of TRHMCI-GNN model is superior to the comparable baseline models on the three real datasets Diginetica, Yoochoose1/64 and Tmall. The code is available now.¹

基于会话的推荐（SBR）的目的主要是通过分析匿名用户的历史行为记录来预测下一个可能的交互项目，并将结果推荐给用户。然而，由于用户的匿名性和行为记录的稀疏性，推荐结果往往不准确。现有的 SBR 模型主要考虑会话中项目的先后顺序，很少分析项目之间复杂的转换关系，此外，它们在挖掘不同会话之间的高阶隐藏关系方面也存在不足。针对这些问题，我们提出了一种主题关系异构多级跨项信息图神经网络（TRHMCI-GNN）来提高推荐性能。该模型试图通过主题分类捕捉项之间的隐藏关系，并构建一个主题关系异构跨项全局图。该图包含会话间跨项信息以及会话间隐藏的主题关系。此外，还建立了自循环星形图来学习会话内的跨项信息，并添加自连接属性来融合每个项自身的信息。利用通道混合注意机制，通过最大池化和平均池化两个通道汇集不同层次的项目信息，从而有效融合跨项目全局图和自环星图的项目信息。这样，该模型就能捕捉到目标物品的全局信息及其个体特征，并增加了标签平滑操作，从而实现推荐。大量实验结果表明，在 Diginetica、Yoochoose1/64 和 Tmall 三个真实数据集上，TRHMCI-GNN 模型的推荐性能优于同类基线模型。代码现已发布。

{"title":"A graph neural network with topic relation heterogeneous multi-level cross-item information for session-based recommendation","authors":"Fan Yang, Dunlu Peng","doi":"10.1016/j.is.2024.102380","DOIUrl":"https://doi.org/10.1016/j.is.2024.102380","url":null,"abstract":"<div><p>The aim of session-based recommendation (SBR) mainly analyzes the anonymous user’s historical behavior records to predict the next possible interaction item and recommend the result to the user. However, due to the anonymity of users and the sparsity of behavior records, recommendation results are often inaccurate. The existing SBR models mainly consider the order of items within a session and rarely analyze the complex transition relationship between items, and additionally, they are inadequate at mining higher-order hidden relationship between different sessions. To address these issues, we propose a topic relation heterogeneous multi-level cross-item information graph neural network (TRHMCI-GNN) to improve the performance of recommendation. The model attempts to capture hidden relationship between items through topic classification and build a topic relation heterogeneous cross-item global graph. The graph contains inter-session cross-item information as well as hidden topic relation among sessions. In addition, a self-loop star graph is established to learn the intra-session cross-item information, and the self-connection attributes are added to fuse the information of each item itself. By using channel-hybrid attention mechanism, the item information of different levels is pooled by two channels: max-pooling and mean-pooling, which effectively fuse the item information of cross-item global graph and self-loop star graph. In this way, the model captures the global information of the target item and its individual features, and the label smoothing operation is added for recommendation. Extensive experimental results demonstrate that the recommendation performance of TRHMCI-GNN model is superior to the comparable baseline models on the three real datasets Diginetica, Yoochoose1/64 and Tmall. The code is available now.<span><sup>1</sup></span></p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"123 ","pages":"Article 102380"},"PeriodicalIF":3.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140209509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection 基于跨模态注意力的深度学习框架，使用统一模态进行多模态假新闻、仇恨言论和攻击性语言检测

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-03-16 DOI: 10.1016/j.is.2024.102378

Eniafe Festus Ayetiran , Özlem Özgöbek

Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets.

假新闻、仇恨言论和攻击性语言是当前影响现代社会的相关邪恶三要素。文本模式已被广泛应用于这些现象的计算检测。近来，这方面的多模态研究吸引了很多人的兴趣，因为其他模态在帮助检测这些威胁方面具有潜力。然而，多模态内容理解中的一个主要问题是如何有效地模拟不同模态的互补性，因为它们的特点和特征各不相同。从多模态的角度来看，对这三个任务的研究主要使用图像和文本模态。如何提高不同多模态方法的有效性仍是一个有待研究的课题。除了传统的文本和图像模式外，我们还考虑了图像文本，这种文本在以往的研究中很少使用，但其中包含的有用信息可以提高预测模型的有效性。为了简化多模态内容理解和增强预测，我们利用计算机视觉和深度学习的最新进展来完成这些任务。首先，除了主要文本外，我们还创建了图像和图像文本的文本表示，从而统一了各种模态。其次，我们提出了一种具有跨模态关注机制的多层深度神经网络，以模拟这些模态之间的互补性。我们进行了广泛的实验，涉及涵盖这三个任务的三个标准数据集。实验结果表明，假新闻、仇恨言论和攻击性语言的检测都能受益于这种方法。此外，我们还进行了鲁棒消融实验，以显示我们方法的有效性。在所有数据集上，我们的模型都明显优于之前的作品。

{"title":"An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection","authors":"Eniafe Festus Ayetiran , Özlem Özgöbek","doi":"10.1016/j.is.2024.102378","DOIUrl":"https://doi.org/10.1016/j.is.2024.102378","url":null,"abstract":"<div><p>Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"123 ","pages":"Article 102378"},"PeriodicalIF":3.7,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S030643792400036X/pdfft?md5=a31db78e16613aefde39a1acfcbb50af&pid=1-s2.0-S030643792400036X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140163766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0