Pub Date : 2024-08-02DOI: 10.1016/j.is.2024.102434
Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.
{"title":"Unveiling the causes of waiting time in business processes from event logs","authors":"","doi":"10.1016/j.is.2024.102434","DOIUrl":"10.1016/j.is.2024.102434","url":null,"abstract":"<div><p>Waiting times in a business process often arise when a case transitions from one activity to another. Accordingly, analyzing the causes of waiting times in activity transitions can help analysts identify opportunities for reducing the cycle time of a process. This paper proposes a process mining approach to decompose observed waiting times in each activity transition into multiple direct causes and to analyze the impact of each identified cause on the process cycle time efficiency. The approach is implemented as a software tool called Kronos that process analysts can use to upload event logs and obtain analysis results of waiting time causes. The proposed approach was empirically evaluated using synthetic event logs to verify its ability to discover different direct causes of waiting times. The applicability of the approach is demonstrated in a real-life process. Interviews with process mining experts confirm that Kronos is useful and easy to use for identifying improvement opportunities related to waiting times.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000929/pdfft?md5=b33e9c78bfb4c612b6425be5538b1251&pid=1-s2.0-S0306437924000929-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.is.2024.102429
Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.
{"title":"Contrastive learning enhanced by graph neural networks for Universal Multivariate Time Series Representation","authors":"","doi":"10.1016/j.is.2024.102429","DOIUrl":"10.1016/j.is.2024.102429","url":null,"abstract":"<div><p>Analyzing multivariate time series data is crucial for many real-world issues, such as power forecasting, traffic flow forecasting, industrial anomaly detection, and more. Recently, universal frameworks for time series representation based on representation learning have received widespread attention due to their ability to capture changes in the distribution of time series data. However, existing time series representation learning models, when confronting multivariate time series data, merely apply contrastive learning methods to construct positive and negative samples for each variable at the timestamp level, and then employ a contrastive loss function to encourage the model to learn the similarities among the positive samples and the dissimilarities among the negative samples for each variable. Despite this, they fail to fully exploit the latent space dependencies between pairs of variables. To address this problem, we propose the Contrastive Learning Enhanced by Graph Neural Networks for Universal Multivariate Time Series Representation (COGNet), which has three distinctive features. (1) COGNet is a comprehensive self-supervised learning model that combines autoencoders and contrastive learning methods. (2) We introduce graph feature representation blocks on top of the backbone encoder, which extract adjacency features of each variable with other variables. (3) COGNet uses graph contrastive loss to learn graph feature representations. Experimental results across multiple public datasets indicate that COGNet outperforms existing methods in time series prediction and anomaly detection tasks.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-20DOI: 10.1016/j.is.2024.102432
Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.
{"title":"LS-ICE: A Load State Intercase Encoding framework for improved predictive monitoring of business processes","authors":"","doi":"10.1016/j.is.2024.102432","DOIUrl":"10.1016/j.is.2024.102432","url":null,"abstract":"<div><p>Research on developing techniques for predictive process monitoring has generally relied on feature encoding schemes that extract intra-case features from events to make predictions. In doing so, the processing of cases is assumed to be solely influenced by the attributes of the cases themselves. However, cases are not processed in isolation and can be influenced by the processing of other cases or, more generally, the state of the process under investigation. In this work, we propose the LS-ICE (load state intercase encoding) framework for encoding intercase features that enriches events with a depiction of the state of relevant load points in a business process. To assess the benefits of the intercase features generated using the LS-ICE framework, we compare the performance of predictive process monitoring models constructed using the encoded features against baseline models without these features. The models are evaluated for remaining trace and runtime prediction using five real-life event logs. Across the board, a consistent improvement in performance is noted for models that integrate intercase features encoded through the proposed framework, as opposed to baseline models that lack these encoded features.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1016/j.is.2024.102431
Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.
{"title":"Bot log mining: An approach to the integrated analysis of Robotic Process Automation and process mining","authors":"","doi":"10.1016/j.is.2024.102431","DOIUrl":"10.1016/j.is.2024.102431","url":null,"abstract":"<div><p>Process mining and Robotic Process Automation (RPA) are two technologies of great interest in research and practice. Process mining uses event logs as input, but much of the information available about processes is not yet considered since the data is outside the scope of ordinary event logs. RPA technology can automate tasks by using bots, and the executed steps can be recorded, which could be a valuable data source for process mining. With the use of RPA technology expected to grow, an integrated view of steps performed by bots in business processes is needed. In process mining, various techniques to analyze processes have already been developed. Most RPA software also includes basic measures to monitor bot performance. However, the isolated use of bot-related or process mining measures does not provide an end-to-end view of bot-enabled business processes. To address these issues, we develop an approach that enables using RPA logs for process mining and propose tailored measures to analyze merged bot and process logs. We use the design science research process to structure our work and evaluate the approach by conducting a total of 14 interviews with experts from industry and research. We also implement a software prototype and test it on real-world and artificial data. This approach contributes to prescriptive knowledge by providing a concept on how to use bot logs for process mining and brings the research streams of RPA and process mining further together. It provides new data that expands the possibilities of existing process mining techniques in research and practice, and it enables new analyses that can observe bot-human interaction and show the effects of bots on business processes.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1016/j.is.2024.102430
Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.
{"title":"Enhancing transparency in public procurement: A data-driven analytics approach","authors":"","doi":"10.1016/j.is.2024.102430","DOIUrl":"10.1016/j.is.2024.102430","url":null,"abstract":"<div><p>Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1016/j.is.2024.102427
Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.
{"title":"A survey of sequential recommendation systems: Techniques, evaluation, and future directions","authors":"","doi":"10.1016/j.is.2024.102427","DOIUrl":"10.1016/j.is.2024.102427","url":null,"abstract":"<div><p>Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.is.2024.102428
The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.
{"title":"Making cyber-human systems smarter","authors":"","doi":"10.1016/j.is.2024.102428","DOIUrl":"10.1016/j.is.2024.102428","url":null,"abstract":"<div><div>The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1016/j.is.2024.102426
Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific generalization techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this generalization-aware compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of within-distance queries for threshold-based detection of molecular events of interest, such as the formation of Hydrogen Bonds (H-Bonds).
{"title":"Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions","authors":"","doi":"10.1016/j.is.2024.102426","DOIUrl":"10.1016/j.is.2024.102426","url":null,"abstract":"<div><p>Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific <em>generalization</em> techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this <em>generalization-aware</em> compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of <em>within-distance</em> queries for threshold-based detection of molecular events of interest, such as the formation of <em>Hydrogen Bonds</em> (H-Bonds).</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141700660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.is.2024.102422
Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof
One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of Infrastructure-as-Code (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.
We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.
The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.
We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.
{"title":"DOML: A new modeling approach to Infrastructure-as-Code","authors":"Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof","doi":"10.1016/j.is.2024.102422","DOIUrl":"https://doi.org/10.1016/j.is.2024.102422","url":null,"abstract":"<div><p>One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of <em>Infrastructure-as-Code</em> (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.</p><p>We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.</p><p>The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.</p><p>We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000802/pdfft?md5=c405d21d1f83737d4493eb269ebc2006&pid=1-s2.0-S0306437924000802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.is.2024.102425
Adeel Aslam, Giovanni Simonini, Luca Gagliardelli, Luca Zecchini, Sonia Bergamaschi
Inequality join is an operator to join data on inequality conditions and it is a fundamental building block for applications. While methods and optimizations exist for efficient inequality join in batch processing, little attention has been given to its streaming version, particularly to large-scale data-intensive applications that run on Distributed Stream Processing Systems (DSPSs). Designing an inequality join in streaming and distributed settings is not an easy task: (i) indexes have to be employed to efficiently support inequality-based comparisons, but the continuous stream of data imposes continuous insertions, updates, and deletions of elements in the indexes—hence a huge overhead for the DSPSs; (ii) oftentimes real data is skewed, which makes indexing even more challenging.
To address these challenges, we propose the Stream-Aware inequality join (STA), an indexing method that can reduce redundancy and index update overhead. STA builds a separate in-memory index structure for hotkeys, i.e., the most frequently used keys, which are automatically identified with an efficient data sketch. On the other hand, the cold keys are treated using a linked set of index structures. In this way, STA avoids many superfluous index updates for frequent items. Finally, we implement four state-of-the-art inequality join solutions for a widely employed DSPS (Apache Storm) and compare their performance with STA on four real-world data sets and a synthetic one. The results of our experimental evaluation reveal that our stream-aware approach outperforms existing solutions.
{"title":"Stream-aware indexing for distributed inequality join processing","authors":"Adeel Aslam, Giovanni Simonini, Luca Gagliardelli, Luca Zecchini, Sonia Bergamaschi","doi":"10.1016/j.is.2024.102425","DOIUrl":"https://doi.org/10.1016/j.is.2024.102425","url":null,"abstract":"<div><p>Inequality join is an operator to join data on inequality conditions and it is a fundamental building block for applications. While methods and optimizations exist for efficient inequality join in batch processing, little attention has been given to its streaming version, particularly to large-scale data-intensive applications that run on <em>Distributed Stream Processing Systems</em> (DSPSs). Designing an inequality join in streaming and distributed settings is not an easy task: <em>(i)</em> indexes have to be employed to efficiently support inequality-based comparisons, but the continuous stream of data imposes continuous insertions, updates, and deletions of elements in the indexes—hence a huge overhead for the DSPSs; <em>(ii)</em> oftentimes real data is skewed, which makes indexing even more challenging.</p><p>To address these challenges, we propose the <em>Stream-Aware inequality join</em> (STA), an indexing method that can reduce redundancy and index update overhead. STA builds a separate in-memory index structure for hotkeys, i.e., the most frequently used keys, which are automatically identified with an efficient data sketch. On the other hand, the cold keys are treated using a linked set of index structures. In this way, STA avoids many superfluous index updates for frequent items. Finally, we implement four state-of-the-art inequality join solutions for a widely employed DSPS (Apache Storm) and compare their performance with STA on four real-world data sets and a synthetic one. The results of our experimental evaluation reveal that our stream-aware approach outperforms existing solutions.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}