Pub Date : 2024-08-05DOI: 10.1016/j.is.2024.102435
Lingfeng Bian , Weidong Yang , Ting Xu , Zijing Tan
Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance consistent with a set of DCs, and a set of tuple insertions to , our aim is to find a set of tuple insertions such that is satisfied on . We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by w.r.t. , and further develop an efficient repairing algorithm to compute by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.
为提高数据质量,人们对数据修复算法进行了广泛研究。通常采用拒绝约束(DC)来说明数据应满足的质量规范,从而促进数据修复,因为拒绝约束的通用性足以包含许多其他依赖关系。在实践中,数据通常会频繁更新,这就促使人们寻求高效的增量修复技术来应对数据更新。在本文中,我们提出了第一种用于修复违反 DC 的增量算法。具体来说,给定一个与一组 DC Σ 一致的关系实例 I 和一组插入到 I 中的元组 △ I,我们的目标是找到一组插入元组 △ I′,从而在 I+△ I′ 上满足 Σ。我们首先形式化并证明了使用 DC 进行增量数据修复问题的复杂性。然后,我们提出了结合辅助索引结构的技术,以有效识别△ I 对Σ的DC违反,并进一步开发了一种有效的修复算法,通过解决DC违反来计算△ I′。最后,我们使用真实数据集和合成数据集进行了大量实验,以证明我们的方法的有效性和效率。
{"title":"An incremental algorithm for repairing denial constraint violations","authors":"Lingfeng Bian , Weidong Yang , Ting Xu , Zijing Tan","doi":"10.1016/j.is.2024.102435","DOIUrl":"10.1016/j.is.2024.102435","url":null,"abstract":"<div><p>Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance <span><math><mi>I</mi></math></span> consistent with a set <span><math><mi>Σ</mi></math></span> of DCs, and a set <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> of tuple insertions to <span><math><mi>I</mi></math></span>, our aim is to find a set <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> of tuple insertions such that <span><math><mi>Σ</mi></math></span> is satisfied on <span><math><mrow><mi>I</mi><mo>+</mo><mo>△</mo></mrow></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span>. We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> <em>w.r.t.</em> <span><math><mi>Σ</mi></math></span>, and further develop an efficient repairing algorithm to compute <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102435"},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.is.2024.102434
Katsiaryna Lashkevich, Fredrik Milani, David Chapela-Campa, Ihar Suvorau, Marlon Dumas
Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.
{"title":"Unveiling the causes of waiting time in business processes from event logs","authors":"Katsiaryna Lashkevich, Fredrik Milani, David Chapela-Campa, Ihar Suvorau, Marlon Dumas","doi":"10.1016/j.is.2024.102434","DOIUrl":"10.1016/j.is.2024.102434","url":null,"abstract":"<div><p>Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102434"},"PeriodicalIF":3.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000929/pdfft?md5=b33e9c78bfb4c612b6425be5538b1251&pid=1-s2.0-S0306437924000929-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.is.2024.102429
Xinghao Wang, Qiang Xing, Huimin Xiao, Ming Ye
Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.
{"title":"Contrastive learning enhanced by graph neural networks for Universal Multivariate Time Series Representation","authors":"Xinghao Wang, Qiang Xing, Huimin Xiao, Ming Ye","doi":"10.1016/j.is.2024.102429","DOIUrl":"10.1016/j.is.2024.102429","url":null,"abstract":"<div><p>Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102429"},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-20DOI: 10.1016/j.is.2024.102432
Björn Rafn Gunnarsson , Seppe vanden Broucke , Jochen De Weerdt
Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.
{"title":"LS-ICE: A Load State Intercase Encoding framework for improved predictive monitoring of business processes","authors":"Björn Rafn Gunnarsson , Seppe vanden Broucke , Jochen De Weerdt","doi":"10.1016/j.is.2024.102432","DOIUrl":"10.1016/j.is.2024.102432","url":null,"abstract":"<div><p>Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102432"},"PeriodicalIF":3.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1016/j.is.2024.102431
Andreas Egger , Arthur H.M. ter Hofstede , Wolfgang Kratsch , Sander J.J. Leemans , Maximilian Röglinger , Moe T. Wynn
Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.
{"title":"Bot log mining: An approach to the integrated analysis of Robotic Process Automation and process mining","authors":"Andreas Egger , Arthur H.M. ter Hofstede , Wolfgang Kratsch , Sander J.J. Leemans , Maximilian Röglinger , Moe T. Wynn","doi":"10.1016/j.is.2024.102431","DOIUrl":"10.1016/j.is.2024.102431","url":null,"abstract":"<div><p>Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"126 ","pages":"Article 102431"},"PeriodicalIF":3.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.
{"title":"Enhancing transparency in public procurement: A data-driven analytics approach","authors":"Heriberto Felizzola , Camilo Gomez , Nicolas Arrieta , Vianey Jerez , Yilber Erazo , Geraldine Camacho","doi":"10.1016/j.is.2024.102430","DOIUrl":"10.1016/j.is.2024.102430","url":null,"abstract":"<div><p>Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102430"},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.
{"title":"A survey of sequential recommendation systems: Techniques, evaluation, and future directions","authors":"Tesfaye Fenta Boka, Zhendong Niu, Rama Bastola Neupane","doi":"10.1016/j.is.2024.102427","DOIUrl":"10.1016/j.is.2024.102427","url":null,"abstract":"<div><p>Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102427"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.is.2024.102428
Steven Alter
The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.
{"title":"Making cyber-human systems smarter","authors":"Steven Alter","doi":"10.1016/j.is.2024.102428","DOIUrl":"10.1016/j.is.2024.102428","url":null,"abstract":"<div><div>The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102428"},"PeriodicalIF":3.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1016/j.is.2024.102426
Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar
Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific generalization techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this generalization-aware compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of within-distance queries for threshold-based detection of molecular events of interest, such as the formation of Hydrogen Bonds (H-Bonds).
{"title":"Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions","authors":"Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar","doi":"10.1016/j.is.2024.102426","DOIUrl":"10.1016/j.is.2024.102426","url":null,"abstract":"<div><p>Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific <em>generalization</em> techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this <em>generalization-aware</em> compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of <em>within-distance</em> queries for threshold-based detection of molecular events of interest, such as the formation of <em>Hydrogen Bonds</em> (H-Bonds).</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102426"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141700660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.is.2024.102422
Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof
One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of Infrastructure-as-Code (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.
We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.
The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.
We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.
{"title":"DOML: A new modeling approach to Infrastructure-as-Code","authors":"Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof","doi":"10.1016/j.is.2024.102422","DOIUrl":"https://doi.org/10.1016/j.is.2024.102422","url":null,"abstract":"<div><p>One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of <em>Infrastructure-as-Code</em> (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.</p><p>We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.</p><p>The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.</p><p>We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102422"},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000802/pdfft?md5=c405d21d1f83737d4493eb269ebc2006&pid=1-s2.0-S0306437924000802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}