Information Systems最新文献_第4页

An incremental algorithm for repairing denial constraint violations 修复拒绝约束违规行为的增量算法

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-08-05 DOI: 10.1016/j.is.2024.102435

Lingfeng Bian , Weidong Yang , Ting Xu , Zijing Tan

Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance $I$ consistent with a set $Σ$ of DCs, and a set $△$ $I$ of tuple insertions to $I$ , our aim is to find a set $△$ $I^{'}$ of tuple insertions such that $Σ$ is satisfied on $I + △$ $I^{'}$ . We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by $△$ $I$ w.r.t. $Σ$ , and further develop an efficient repairing algorithm to compute $△$ $I^{'}$ by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.

为提高数据质量，人们对数据修复算法进行了广泛研究。通常采用拒绝约束（DC）来说明数据应满足的质量规范，从而促进数据修复，因为拒绝约束的通用性足以包含许多其他依赖关系。在实践中，数据通常会频繁更新，这就促使人们寻求高效的增量修复技术来应对数据更新。在本文中，我们提出了第一种用于修复违反 DC 的增量算法。具体来说，给定一个与一组 DC Σ 一致的关系实例 I 和一组插入到 I 中的元组 △ I，我们的目标是找到一组插入元组 △ I′，从而在 I+△ I′ 上满足 Σ。我们首先形式化并证明了使用 DC 进行增量数据修复问题的复杂性。然后，我们提出了结合辅助索引结构的技术，以有效识别△ I 对Σ的DC违反，并进一步开发了一种有效的修复算法，通过解决DC违反来计算△ I′。最后，我们使用真实数据集和合成数据集进行了大量实验，以证明我们的方法的有效性和效率。

{"title":"An incremental algorithm for repairing denial constraint violations","authors":"Lingfeng Bian , Weidong Yang , Ting Xu , Zijing Tan","doi":"10.1016/j.is.2024.102435","DOIUrl":"10.1016/j.is.2024.102435","url":null,"abstract":"<div><p>Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance <span><math><mi>I</mi></math></span> consistent with a set <span><math><mi>Σ</mi></math></span> of DCs, and a set <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> of tuple insertions to <span><math><mi>I</mi></math></span>, our aim is to find a set <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> of tuple insertions such that <span><math><mi>Σ</mi></math></span> is satisfied on <span><math><mrow><mi>I</mi><mo>+</mo><mo>△</mo></mrow></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span>. We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> <em>w.r.t.</em> <span><math><mi>Σ</mi></math></span>, and further develop an efficient repairing algorithm to compute <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102435"},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling the causes of waiting time in business processes from event logs 从事件日志中揭示业务流程等待时间的原因

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-08-02 DOI: 10.1016/j.is.2024.102434

Katsiaryna Lashkevich, Fredrik Milani, David Chapela-Campa, Ihar Suvorau, Marlon Dumas

Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.

业务流程中的等待时间往往出现在个案从一项活动过渡到另一项活动时。因此，分析活动转换中等待时间的原因可以帮助分析人员确定缩短流程周期时间的机会。本文提出了一种流程挖掘方法，可将每个活动转换中观察到的等待时间分解为多个直接原因，并分析每个已识别原因对流程周期时间效率的影响。该方法以名为 Kronos 的软件工具的形式实施，流程分析师可使用该工具上传事件日志并获取等待时间原因的分析结果。我们使用合成事件日志对所提出的方法进行了实证评估，以验证其发现造成等待时间的不同直接原因的能力。该方法的适用性在实际流程中得到了验证。与流程挖掘专家的访谈证实，Kronos 在确定与等待时间相关的改进机会方面非常有用且易于使用。

{"title":"Unveiling the causes of waiting time in business processes from event logs","authors":"Katsiaryna Lashkevich, Fredrik Milani, David Chapela-Campa, Ihar Suvorau, Marlon Dumas","doi":"10.1016/j.is.2024.102434","DOIUrl":"10.1016/j.is.2024.102434","url":null,"abstract":"<div><p>Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102434"},"PeriodicalIF":3.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000929/pdfft?md5=b33e9c78bfb4c612b6425be5538b1251&pid=1-s2.0-S0306437924000929-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive learning enhanced by graph neural networks for Universal Multivariate Time Series Representation 利用图神经网络增强对比学习，实现通用多变量时间序列表征

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-25 DOI: 10.1016/j.is.2024.102429

Xinghao Wang, Qiang Xing, Huimin Xiao, Ming Ye

Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.

分析多变量时间序列数据对许多现实问题至关重要，如电力预测、交通流量预测、工业异常检测等。最近，基于表示学习的时间序列表示通用框架因其捕捉时间序列数据分布变化的能力而受到广泛关注。然而，现有的时间序列表示学习模型在面对多变量时间序列数据时，只是应用对比学习方法在时间戳级别为每个变量构建正样本和负样本，然后采用对比损失函数来鼓励模型学习每个变量的正样本之间的相似性和负样本之间的不相似性。尽管如此，这些方法未能充分利用变量对之间的潜在空间依赖关系。为了解决这个问题，我们提出了用于通用多变量时间序列表示的图神经网络增强对比学习（COGNet），它有三个显著特点。(1) COGNet 是一种结合了自动编码器和对比学习方法的综合自监督学习模型。(2) 我们在主干编码器上引入图特征表示块，提取每个变量与其他变量的邻接特征。(3) COGNet 使用图对比损失来学习图特征表示。多个公共数据集的实验结果表明，COGNet 在时间序列预测和异常检测任务中的表现优于现有方法。

{"title":"Contrastive learning enhanced by graph neural networks for Universal Multivariate Time Series Representation","authors":"Xinghao Wang, Qiang Xing, Huimin Xiao, Ming Ye","doi":"10.1016/j.is.2024.102429","DOIUrl":"10.1016/j.is.2024.102429","url":null,"abstract":"<div><p>Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102429"},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LS-ICE: A Load State Intercase Encoding framework for improved predictive monitoring of business processes LS-ICE：用于改进业务流程预测性监测的负载状态互例编码框架

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-20 DOI: 10.1016/j.is.2024.102432

Björn Rafn Gunnarsson , Seppe vanden Broucke , Jochen De Weerdt

Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.

关于开发预测性流程监控技术的研究通常依赖于特征编码方案，从事件中提取案例内部特征来进行预测。在此过程中，案例处理被假定为仅受案例本身属性的影响。然而，案例的处理并不是孤立的，可能会受到其他案例处理的影响，或者更广泛地说，受到所调查流程状态的影响。在这项工作中，我们提出了用于编码案例间特征的 LS-ICE（负载状态案例间编码）框架，通过描述业务流程中相关负载点的状态来丰富事件。为了评估使用 LS-ICE 框架生成的案例间特征的优势，我们将使用编码特征构建的预测性流程监控模型的性能与没有这些特征的基线模型进行了比较。我们使用五个真实事件日志对模型的剩余跟踪和运行时间预测进行了评估。结果表明，与缺乏这些编码特征的基线模型相比，集成了拟议框架编码的案例间特征的模型在性能上得到了全面提升。

{"title":"LS-ICE: A Load State Intercase Encoding framework for improved predictive monitoring of business processes","authors":"Björn Rafn Gunnarsson , Seppe vanden Broucke , Jochen De Weerdt","doi":"10.1016/j.is.2024.102432","DOIUrl":"10.1016/j.is.2024.102432","url":null,"abstract":"<div><p>Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102432"},"PeriodicalIF":3.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bot log mining: An approach to the integrated analysis of Robotic Process Automation and process mining 机器人日志挖掘：机器人流程自动化和流程挖掘的综合分析方法

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-19 DOI: 10.1016/j.is.2024.102431

Andreas Egger , Arthur H.M. ter Hofstede , Wolfgang Kratsch , Sander J.J. Leemans , Maximilian Röglinger , Moe T. Wynn

Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.

流程挖掘和机器人流程自动化（RPA）是研究和实践中备受关注的两项技术。流程挖掘使用事件日志作为输入，但由于数据超出了普通事件日志的范围，因此许多流程信息尚未被考虑在内。RPA 技术可以通过使用机器人实现任务自动化，并记录执行步骤，这可能成为流程挖掘的宝贵数据源。随着 RPA 技术的使用预计会越来越多，需要对机器人在业务流程中执行的步骤进行综合查看。在流程挖掘方面，已经开发出了各种分析流程的技术。大多数 RPA 软件还包括监控机器人性能的基本措施。然而，孤立地使用机器人相关或流程挖掘措施并不能提供端到端的机器人业务流程视图。为了解决这些问题，我们开发了一种可以使用 RPA 日志进行流程挖掘的方法，并提出了量身定制的措施来分析合并的机器人和流程日志。我们使用设计科学研究流程来构建我们的工作，并通过与行业和研究领域的专家进行 14 次访谈来评估该方法。我们还实施了一个软件原型，并在真实世界和人工数据上对其进行了测试。这种方法提供了如何使用机器人日志进行流程挖掘的概念，从而为规范性知识做出了贡献，并将 RPA 和流程挖掘这两个研究流进一步结合在一起。它提供了新的数据，拓展了现有流程挖掘技术在研究和实践中的可能性，并实现了新的分析，可以观察机器人与人类的交互，显示机器人对业务流程的影响。

{"title":"Bot log mining: An approach to the integrated analysis of Robotic Process Automation and process mining","authors":"Andreas Egger , Arthur H.M. ter Hofstede , Wolfgang Kratsch , Sander J.J. Leemans , Maximilian Röglinger , Moe T. Wynn","doi":"10.1016/j.is.2024.102431","DOIUrl":"10.1016/j.is.2024.102431","url":null,"abstract":"<div><p>Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102431"},"PeriodicalIF":3.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing transparency in public procurement: A data-driven analytics approach 提高公共采购的透明度：数据驱动的分析方法

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-17 DOI: 10.1016/j.is.2024.102430

Heriberto Felizzola , Camilo Gomez , Nicolas Arrieta , Vianey Jerez , Yilber Erazo , Geraldine Camacho

Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.

开放数据是各国政府用来提高公共采购过程的透明度和问责制的一种策略。要从开放数据中获益，就必须对数据进行探索和分析，以便对采购实践获得有意义的见解。然而，对于不精通数据且缺乏领域专业知识的用户来说，访问、处理和分析开放数据可能具有挑战性，这给利用开放采购数据造成了障碍。为了解决这个问题，我们介绍了可视化分析工具的设计、开发和实施。该工具可自动从多个来源提取数据，执行数据清理、标准化和数据库处理，并生成有意义的可视化数据，以简化公共采购分析。此外，该工具还能估算不同层面（如地区或公共实体）的腐败风险指标并将其可视化，从而为了解采购过程的廉洁性提供宝贵的见解。这项工作的主要贡献包括(1）为开放式采购数据可视化工具的开发提供了全面指导；（2）提出了支持处理、腐败风险估算和数据可视化的数据管道；（3）通过案例研究展示了可视化分析如何有效利用开放式数据来产生促进和提高透明度的洞察力。

{"title":"Enhancing transparency in public procurement: A data-driven analytics approach","authors":"Heriberto Felizzola , Camilo Gomez , Nicolas Arrieta , Vianey Jerez , Yilber Erazo , Geraldine Camacho","doi":"10.1016/j.is.2024.102430","DOIUrl":"10.1016/j.is.2024.102430","url":null,"abstract":"<div><p>Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102430"},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey of sequential recommendation systems: Techniques, evaluation, and future directions 顺序推荐系统调查：技术、评估和未来方向

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-16 DOI: 10.1016/j.is.2024.102427

Tesfaye Fenta Boka, Zhendong Niu, Rama Bastola Neupane

Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.

推荐系统是成功应用数据挖掘和机器学习技术的强大工具。传统上，这些系统侧重于预测单次交互，如用户与商品之间的评分。然而，这种方法忽略了用户交互的复杂性，因为用户交互往往涉及一段时间内的多次交互，如浏览、将物品添加到购物车等。近期的研究已转向利用这些更丰富的数据来建立更详细的用户档案并发现复杂的用户行为模式。顺序推荐系统能够模拟用户随时间不断变化的偏好，因此最近受到了广泛关注。本调查探讨了这些系统如何利用交互历史来提供更准确、更个性化的推荐。我们概述了顺序推荐系统采用的技术，讨论了评估方法，并强调了未来的研究方向。我们根据基本原理对现有方法进行了分类，并评估了它们在不同应用领域的有效性。此外，我们还概述了顺序推荐系统所面临的挑战和机遇。

{"title":"A survey of sequential recommendation systems: Techniques, evaluation, and future directions","authors":"Tesfaye Fenta Boka, Zhendong Niu, Rama Bastola Neupane","doi":"10.1016/j.is.2024.102427","DOIUrl":"10.1016/j.is.2024.102427","url":null,"abstract":"<div><p>Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102427"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Making cyber-human systems smarter 让网络-人类系统更加智能

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-14 DOI: 10.1016/j.is.2024.102428

Steven Alter

The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.

智能一词经常被不经意地用于系统、设备和其他实体（如城市），它们可以捕捉或以其他方式处理或使用信息。这篇概念性论文在论述智能这一概念时，提出了让网络人类系统变得更加智能的方向。网络人类系统可被视为工作系统。本文定义了工作系统、网络人类系统、算法代理以及系统和设备的智能性。本文将这些观点与可以通过应用管理者和信息系统设计师很少讨论（如果有的话）的观点来应对的挑战联系起来，这些观点包括：设备和系统的智能化维度、工作的各个方面、算法代理的角色和责任、人与算法代理之间不同类型的参与和互动模式、各种类型知识对象的明确使用，以及通常不被强调的性能标准。这些想法结合在一起，为信息系统分析和设计实践提供了许多机会，使网络-人系统变得更加智能。

引用次数: 0

Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions 压缩分子运动的广义轨迹，高效检测化学相互作用

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-11 DOI: 10.1016/j.is.2024.102426

Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar

Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific generalization techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this generalization-aware compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of within-distance queries for threshold-based detection of molecular events of interest, such as the formation of Hydrogen Bonds (H-Bonds).

分子动力学（MD）模拟常用于研究药物发现和开发等领域中各种化学相互作用的特性，尤其是在进行实际实验研究成本高昂和/或不安全的情况下。通过研究 MD 模拟生成的分子/原子运动轨迹，可以获得实验中每个原子在每个时间段的详细原子级空间位置。通过对这些数据的分析，可以在原子和分子水平上了解相关系统各成分之间的相互作用。然而，这些数据极其庞大，给相关原子级运动轨迹的查询和分析带来了存储和处理方面的挑战。我们迈出了第一步，将特定领域的通用化技术应用于数据表示，随后用于应用轨迹压缩算法，以降低存储要求并加快对 MD 模拟数据进行距离内查询的处理速度。我们证明，将这种泛化感知压缩技术应用于本案例研究中使用的数据集时，在减少数据量和处理时间方面取得了显著改进，而且不会牺牲基于阈值的分子事件（如氢键（H-Bonds）的形成）检测的距离内查询的有效性。

{"title":"Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions","authors":"Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar","doi":"10.1016/j.is.2024.102426","DOIUrl":"10.1016/j.is.2024.102426","url":null,"abstract":"<div><p>Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific <em>generalization</em> techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this <em>generalization-aware</em> compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of <em>within-distance</em> queries for threshold-based detection of molecular events of interest, such as the formation of <em>Hydrogen Bonds</em> (H-Bonds).</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102426"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141700660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DOML: A new modeling approach to Infrastructure-as-Code DOML：基础设施即代码的新建模方法

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems

Pub Date : 2024-07-06 DOI: 10.1016/j.is.2024.102422

Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof

One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of Infrastructure-as-Code (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.

We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.

The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.

We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.

DevOps 的主要实践之一是资源调配和复杂软件部署的自动化。这种自动化是通过明确定义基础设施即代码（IaC）来实现的，即一组脚本，通常用不同的建模语言编写，定义要配置的基础设施和要部署的应用程序。DOML 是一种建模方法，可映射到多种 IaC 语言中，用于解决基础设施调配、应用程序部署和配置问题。DOML 背后的理念是使用单一建模范式，这有助于减少对使用不同专业 IaC 语言的深层次专业技术知识的需求。我们介绍了 DOML 的原理，并讨论了 IaC 语言的相关工作。我们介绍了 DOML 的原理，并讨论了 IaC 语言方面的相关工作。此外，通过与 Ansible、Terraform 和 Cloudify 等最先进的 IaC 语言进行比较，我们展示了 DOML 对最终用户的优势，并通过几个例子和案例研究对其有效性进行了评估。

{"title":"DOML: A new modeling approach to Infrastructure-as-Code","authors":"Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof","doi":"10.1016/j.is.2024.102422","DOIUrl":"https://doi.org/10.1016/j.is.2024.102422","url":null,"abstract":"<div><p>One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of <em>Infrastructure-as-Code</em> (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.</p><p>We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.</p><p>The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.</p><p>We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102422"},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000802/pdfft?md5=c405d21d1f83737d4493eb269ebc2006&pid=1-s2.0-S0306437924000802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0