Pub Date : 2025-09-29DOI: 10.1016/j.is.2025.102632
Alexander Stahl, Ingo Schmitt
This article provides a detailed explanation of the BBQ-Tree, a unified logic-based model that integrates both classical Decision Trees and Quantum-Logic Decision Trees into a generalized framework for classification and regression. As it combines these paradigms, the BBQ-Tree effectively addresses problems with both linear and curved decision boundaries while prioritizing interpretability. We provide a detailed description of the underlying concepts, a possible training algorithm, experimental evaluations and the incorporation of regression functionality, broadening its applicability beyond classification tasks. Strategies for efficient training and model optimization are also presented. Experimental results demonstrate that the BBQ-Tree produces compact, interpretable models capable of revealing data trends, while achieving accuracy comparable to Decision Trees. Furthermore, its new regression capabilities highlight its versatility and performance across a wider range of tasks.
{"title":"BBQ-Tree: A unified classifier and regressor combining Boolean and quantum logic decisions","authors":"Alexander Stahl, Ingo Schmitt","doi":"10.1016/j.is.2025.102632","DOIUrl":"10.1016/j.is.2025.102632","url":null,"abstract":"<div><div>This article provides a detailed explanation of the BBQ-Tree, a unified logic-based model that integrates both classical Decision Trees and Quantum-Logic Decision Trees into a generalized framework for classification and regression. As it combines these paradigms, the BBQ-Tree effectively addresses problems with both linear and curved decision boundaries while prioritizing interpretability. We provide a detailed description of the underlying concepts, a possible training algorithm, experimental evaluations and the incorporation of regression functionality, broadening its applicability beyond classification tasks. Strategies for efficient training and model optimization are also presented. Experimental results demonstrate that the BBQ-Tree produces compact, interpretable models capable of revealing data trends, while achieving accuracy comparable to Decision Trees. Furthermore, its new regression capabilities highlight its versatility and performance across a wider range of tasks.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102632"},"PeriodicalIF":3.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145332251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1016/j.is.2025.102631
Dominique Sommers, Natalia Sidorova, Boudewijn van Dongen
Alignments are a well-known process mining technique for reconciling system logs and normative process models. Evidence of certain behaviors in a real system may only be present in one representation – either a log or a model – but not in the other. Since processes involve multiple entities, such as objects and resources performing different tasks with objects, the interaction of these entities must be taken into account in the alignments. Additionally, both logged and modeled representations of reality may be imprecise and only partially represent some of these entities, but not all. In this paper, we introduce the concept of “relaxations” through projections for alignments to deal with partially correct models and logs. Relaxed alignments help to distinguish between trustworthy and untrustworthy content of the two representations (the log and the model) to achieve a better understanding of the underlying process and expose quality issues.
{"title":"In system alignments we trust! Explainable alignments via projections","authors":"Dominique Sommers, Natalia Sidorova, Boudewijn van Dongen","doi":"10.1016/j.is.2025.102631","DOIUrl":"10.1016/j.is.2025.102631","url":null,"abstract":"<div><div>Alignments are a well-known process mining technique for reconciling system logs and normative process models. Evidence of certain behaviors in a real system may only be present in one representation – either a log or a model – but not in the other. Since processes involve multiple entities, such as objects and resources performing different tasks with objects, the interaction of these entities must be taken into account in the alignments. Additionally, both logged and modeled representations of reality may be imprecise and only partially represent some of these entities, but not all. In this paper, we introduce the concept of “relaxations” through projections for alignments to deal with partially correct models and logs. Relaxed alignments help to distinguish between trustworthy and untrustworthy content of the two representations (the log and the model) to achieve a better understanding of the underlying process and expose quality issues.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102631"},"PeriodicalIF":3.4,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-24DOI: 10.1016/j.is.2025.102634
Eliseo Curcio
As artificial intelligence (AI) becomes foundational to enterprise infrastructure, organizations face growing challenges in accurately assessing the full economic implications of AI deployment. Existing metrics such as API token costs, GPU-hour billing, or Total Cost of Ownership (TCO) fail to capture the complete lifecycle costs of AI systems and provide limited comparability across deployment models. This paper introduces the Levelized Cost of Artificial Intelligence (LCOAI), a standardized economic metric designed to quantify the total capital (CAPEX) and operational (OPEX) expenditures per unit of productive AI output, normalized by valid inference volume. Analogous to established metrics like the Levelized Cost of Electricity (LCOE) and the Levelized Cost of Hydrogen (LCOH) in the energy sector, LCOAI provides a rigorous, transparent framework for evaluating and comparing AI deployment strategies. We define the LCOAI methodology in detail and apply it to four representative scenarios OpenAI GPT-4.1 API, Anthropic Claude Haiku API, a self-hosted LLaMA-2–13B deployment, and a cloud-hosted LLaMA-2–13B deployment demonstrating how LCOAI captures critical trade-offs in scalability, investment planning, and cost optimization. Extensive sensitivity analyses further explore the impact of inference volume, CAPEX, and OPEX variability on lifecycle economics. The results illustrate the practical utility of LCOAI in procurement, infrastructure planning, and automation strategy, and establish it as a foundational benchmark for AI economic analysis. Policy implications and directions for future refinement, including integration of environmental and performance-adjusted cost metrics, are also discussed.
随着人工智能(AI)成为企业基础设施的基础,企业在准确评估AI部署的全部经济影响方面面临着越来越大的挑战。现有的指标,如API令牌成本、gpu小时计费或总拥有成本(TCO),无法捕捉人工智能系统的完整生命周期成本,并且在部署模型之间提供有限的可比性。本文介绍了人工智能的平准化成本(LCOAI),这是一种标准化的经济指标,旨在量化每单位生产性人工智能产出的总资本(CAPEX)和运营(OPEX)支出,并通过有效推理量进行规范化。与能源领域的平准化电力成本(LCOE)和氢平准化成本(LCOH)等既定指标类似,LCOAI为评估和比较人工智能部署策略提供了一个严格、透明的框架。我们详细定义了LCOAI方法,并将其应用于四个代表性场景:OpenAI GPT-4.1 API、Anthropic Claude Haiku API、自托管LLaMA-2-13B部署和云托管LLaMA-2-13B部署,展示了LCOAI如何在可扩展性、投资规划和成本优化方面实现关键权衡。广泛的敏感性分析进一步探讨了推理量、CAPEX和OPEX可变性对生命周期经济学的影响。结果说明了LCOAI在采购、基础设施规划和自动化战略中的实际效用,并将其建立为人工智能经济分析的基础基准。还讨论了未来改进的政策影响和方向,包括环境和绩效调整成本指标的整合。
{"title":"Evaluating the lifecycle economics of AI: The levelized cost of artificial intelligence (LCOAI)","authors":"Eliseo Curcio","doi":"10.1016/j.is.2025.102634","DOIUrl":"10.1016/j.is.2025.102634","url":null,"abstract":"<div><div>As artificial intelligence (AI) becomes foundational to enterprise infrastructure, organizations face growing challenges in accurately assessing the full economic implications of AI deployment. Existing metrics such as API token costs, GPU-hour billing, or Total Cost of Ownership (TCO) fail to capture the complete lifecycle costs of AI systems and provide limited comparability across deployment models. This paper introduces the Levelized Cost of Artificial Intelligence (LCOAI), a standardized economic metric designed to quantify the total capital (CAPEX) and operational (OPEX) expenditures per unit of productive AI output, normalized by valid inference volume. Analogous to established metrics like the Levelized Cost of Electricity (LCOE) and the Levelized Cost of Hydrogen (LCOH) in the energy sector, LCOAI provides a rigorous, transparent framework for evaluating and comparing AI deployment strategies. We define the LCOAI methodology in detail and apply it to four representative scenarios OpenAI GPT-4.1 API, Anthropic Claude Haiku API, a self-hosted LLaMA-2–13B deployment, and a cloud-hosted LLaMA-2–13B deployment demonstrating how LCOAI captures critical trade-offs in scalability, investment planning, and cost optimization. Extensive sensitivity analyses further explore the impact of inference volume, CAPEX, and OPEX variability on lifecycle economics. The results illustrate the practical utility of LCOAI in procurement, infrastructure planning, and automation strategy, and establish it as a foundational benchmark for AI economic analysis. Policy implications and directions for future refinement, including integration of environmental and performance-adjusted cost metrics, are also discussed.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102634"},"PeriodicalIF":3.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1016/j.is.2025.102629
Georgios Panagiotis Kalfakis, Nikos Giatrakos
The use of data synopses in Big streaming Data analytics can offer 3 types of scalability: (i) horizontal scalability, for scaling with the volume and velocity of Big streaming Data, (ii) vertical scalability, for scaling with the number of processed streams, and (iii) federated scalability, i.e. reducing the communication cost for performing global analytics across a number of geo-distributed data centers or devices in IoT settings. Despite the aforementioned virtues of synopses, no state-of-the-art Big Data framework or IoT platform provides a native API for stream synopses supporting all three types of required scalability. In this work, we fill this gap by introducing a novel system and architectural paradigm, namely Synopses-as-a-MicroService (SaaMS), for both parallel and geo-distributed stream summarization at scale. SaaMS is developed on Apache Kafka and Kafka Streams and can provide all the required types of scalability together with (i) the ability to seamlessly perform adaptive resource allocation with zero downtime for the running analytics and (ii) the ability to run both across powerful computer clusters and Java-enabled IoT devices. Therefore, SaaMS is directly deployable from applications that either operate on powerful clouds or across the cloud to edge continuum.
{"title":"SaaMS: The synopses-as-a-microservice paradigm for scalable adaptive streaming analytics across the cloud to edge continuum","authors":"Georgios Panagiotis Kalfakis, Nikos Giatrakos","doi":"10.1016/j.is.2025.102629","DOIUrl":"10.1016/j.is.2025.102629","url":null,"abstract":"<div><div>The use of data synopses in Big streaming Data analytics can offer 3 types of scalability: (i) horizontal scalability, for scaling with the volume and velocity of Big streaming Data, (ii) vertical scalability, for scaling with the number of processed streams, and (iii) federated scalability, i.e. reducing the communication cost for performing global analytics across a number of geo-distributed data centers or devices in IoT settings. Despite the aforementioned virtues of synopses, no state-of-the-art Big Data framework or IoT platform provides a native API for stream synopses supporting all three types of required scalability. In this work, we fill this gap by introducing a novel system and architectural paradigm, namely Synopses-as-a-MicroService (SaaMS), for both parallel and geo-distributed stream summarization at scale. SaaMS is developed on Apache Kafka and Kafka Streams and can provide all the required types of scalability together with (i) the ability to seamlessly perform adaptive resource allocation with zero downtime for the running analytics and (ii) the ability to run both across powerful computer clusters and Java-enabled IoT devices. Therefore, SaaMS is directly deployable from applications that either operate on powerful clouds or across the cloud to edge continuum.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102629"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-21DOI: 10.1016/j.is.2025.102630
Jianxin Li , Taotao Cai , Ke Deng , Timos Sellis , Feng Xia
To celebrate the 50th Anniversary of the Information Systems Journal, we are delighted to share our research reflections on the article “Community-diversified influence maximization in social networks” published at Information Systems in 2020. Our reflections will highlight the impact of this article on the authors’ research trajectories, its influence on the broader research community, and its contributions to industry practice.
{"title":"Reflection on community-diversified influence maximization in social networks","authors":"Jianxin Li , Taotao Cai , Ke Deng , Timos Sellis , Feng Xia","doi":"10.1016/j.is.2025.102630","DOIUrl":"10.1016/j.is.2025.102630","url":null,"abstract":"<div><div>To celebrate the 50th Anniversary of the Information Systems Journal, we are delighted to share our research reflections on the article “Community-diversified influence maximization in social networks” published at Information Systems in 2020. Our reflections will highlight the impact of this article on the authors’ research trajectories, its influence on the broader research community, and its contributions to industry practice.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102630"},"PeriodicalIF":3.4,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-20DOI: 10.1016/j.is.2025.102626
Rik Eshuis , Aditya Ghose
Knowledge-intensive processes progress towards the achievement of operational goals. These processes typically rely on data to enable data-driven decision making, but also require substantial flexibility to deal with the complex and dynamic environments in which they operate. Consequently, declarative data-centric process modeling languages such as the Case Management Model and Notation (CMMN) have been proposed to model knowledge-intensive processes. However, while such process models allow to express goals, they specify dependencies between the goals only implicitly. This makes the goal-oriented behavior of declarative data-centric process models hard to understand, and therefore obfuscates the goal-oriented behavior of knowledge-intensive processes. This paper defines a structural, semi-automated approach to explicate the goal-oriented aspects of declarative data-centric process models. The approach first derives goal relations from a declarative data-centric process model and next synthesizes these goal relations into a goal model using an algorithm. The approach is supported by a tool and has been evaluated in case studies. Using the approach, implicit goal dependencies in declarative data-centric process models are expressed in goal models. This supports the understanding of goal-oriented aspects of declarative data-centric process models.
{"title":"Synthesizing goal models from declarative data-centric process models","authors":"Rik Eshuis , Aditya Ghose","doi":"10.1016/j.is.2025.102626","DOIUrl":"10.1016/j.is.2025.102626","url":null,"abstract":"<div><div>Knowledge-intensive processes progress towards the achievement of operational goals. These processes typically rely on data to enable data-driven decision making, but also require substantial flexibility to deal with the complex and dynamic environments in which they operate. Consequently, declarative data-centric process modeling languages such as the Case Management Model and Notation (CMMN) have been proposed to model knowledge-intensive processes. However, while such process models allow to express goals, they specify dependencies between the goals only implicitly. This makes the goal-oriented behavior of declarative data-centric process models hard to understand, and therefore obfuscates the goal-oriented behavior of knowledge-intensive processes. This paper defines a structural, semi-automated approach to explicate the goal-oriented aspects of declarative data-centric process models. The approach first derives goal relations from a declarative data-centric process model and next synthesizes these goal relations into a goal model using an algorithm. The approach is supported by a tool and has been evaluated in case studies. Using the approach, implicit goal dependencies in declarative data-centric process models are expressed in goal models. This supports the understanding of goal-oriented aspects of declarative data-centric process models.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102626"},"PeriodicalIF":3.4,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-18DOI: 10.1016/j.is.2025.102620
Maikel Leon
The rapid evolution of Generative Pre-trained Transformers (GPTs) has revolutionized natural language processing, enabling models to generate coherent text, solve mathematical problems, write code, and even reason about complex tasks. This paper presents a scientific review of GPT-5, OpenAI’s latest flagship model, and examines its innovations in comparison to previous generations of GPT. We summarize the model’s architecture and features, including hierarchical routing, expanded context windows, and enhanced tool-use capabilities, and survey empirical evidence of improved performance on academic benchmarks. A dedicated section discusses the release of open-weight mixture-of-experts models (GPT-OSS), describing their technical design, licensing, and comparative performance. Our analysis synthesizes findings from recent literature on long-context evaluation, cognitive biases, medical summarization, and hallucination vulnerability, highlighting where GPT-5 advances the state of the art and where challenges remain. We conclude by discussing the implications of open-weight models for transparency and reproducibility and propose directions for future research on evaluation, safety, and agentic behavior.
{"title":"GPT-5 and open-weight large language models: Advances in reasoning, transparency, and control","authors":"Maikel Leon","doi":"10.1016/j.is.2025.102620","DOIUrl":"10.1016/j.is.2025.102620","url":null,"abstract":"<div><div>The rapid evolution of Generative Pre-trained Transformers (GPTs) has revolutionized natural language processing, enabling models to generate coherent text, solve mathematical problems, write code, and even reason about complex tasks. This paper presents a scientific review of GPT-5, OpenAI’s latest flagship model, and examines its innovations in comparison to previous generations of GPT. We summarize the model’s architecture and features, including hierarchical routing, expanded context windows, and enhanced tool-use capabilities, and survey empirical evidence of improved performance on academic benchmarks. A dedicated section discusses the release of open-weight mixture-of-experts models (GPT-OSS), describing their technical design, licensing, and comparative performance. Our analysis synthesizes findings from recent literature on long-context evaluation, cognitive biases, medical summarization, and hallucination vulnerability, highlighting where GPT-5 advances the state of the art and where challenges remain. We conclude by discussing the implications of open-weight models for transparency and reproducibility and propose directions for future research on evaluation, safety, and agentic behavior.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102620"},"PeriodicalIF":3.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-18DOI: 10.1016/j.is.2025.102627
Radek Ošlejšek, Radoslav Chudovský, Martin Macak
Hands-on training sessions become a standard way to develop and increase knowledge in cybersecurity. As practical cybersecurity exercises are strongly process-oriented with knowledge-intensive processes, process mining techniques and models can help enhance learning analytics tools. The design of our open-source analytical dashboard is backed by guidelines for visualizing multivariate networks complemented with temporal views and clustering. The design aligns with the requirements for post-training analysis of a special subset of cybersecurity exercises — supervised Capture the Flag games. Usability is demonstrated in a case study using trainees’ engagement measurement to reveal potential flaws in training design or organization.
{"title":"Process-driven visual analysis of cybersecurity capture the flag exercises","authors":"Radek Ošlejšek, Radoslav Chudovský, Martin Macak","doi":"10.1016/j.is.2025.102627","DOIUrl":"10.1016/j.is.2025.102627","url":null,"abstract":"<div><div>Hands-on training sessions become a standard way to develop and increase knowledge in cybersecurity. As practical cybersecurity exercises are strongly process-oriented with knowledge-intensive processes, process mining techniques and models can help enhance learning analytics tools. The design of our open-source analytical dashboard is backed by guidelines for visualizing multivariate networks complemented with temporal views and clustering. The design aligns with the requirements for post-training analysis of a special subset of cybersecurity exercises — supervised Capture the Flag games. Usability is demonstrated in a case study using trainees’ engagement measurement to reveal potential flaws in training design or organization.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102627"},"PeriodicalIF":3.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-17DOI: 10.1016/j.is.2025.102628
Matteo Francia , Stefano Rizzi , Matteo Golfarelli , Patrick Marcel
In an attempt to streamline exploratory data analysis of multidimensional cubes, the Intentional Analytics Model ha been proposed as a way to unite OLAP and analytics by allowing users to indicate their analysis intentions and returning cubes enhanced with models. Five intention operators were envisioned to this end; in this work we focus on the predict operator, whose goal is to estimate the missing values of a cube measure starting from known values of the same measure or other measures using different regression models. Although prediction tasks such as forecasting and imputation are routinary for analysts, the added value of our approach is (i) to encapsulate them in a declarative, concise, natural language-like syntax; (ii) to automate the selection of the best measures to be used and the computation of the models, and (iii) to automate the evaluation of the interest of the models computed. First we propose a syntax and a semantics for predict and discuss how enhanced cubes are built by (i) predicting the missing values for a measure based on the available information via one or more models and (ii) highlighting the most interesting prediction. Then we test the operator implementation, proving that its performance is in line with the interactivity requirement of OLAP session and that accurate predictions can be returned.
{"title":"Predicting multidimensional cubes through intentional analytics","authors":"Matteo Francia , Stefano Rizzi , Matteo Golfarelli , Patrick Marcel","doi":"10.1016/j.is.2025.102628","DOIUrl":"10.1016/j.is.2025.102628","url":null,"abstract":"<div><div>In an attempt to streamline exploratory data analysis of multidimensional cubes, the Intentional Analytics Model ha been proposed as a way to unite OLAP and analytics by allowing users to indicate their analysis intentions and returning cubes enhanced with models. Five intention operators were envisioned to this end; in this work we focus on the <span>predict</span> operator, whose goal is to estimate the missing values of a cube measure starting from known values of the same measure or other measures using different regression models. Although prediction tasks such as forecasting and imputation are routinary for analysts, the added value of our approach is (i) to encapsulate them in a declarative, concise, natural language-like syntax; (ii) to automate the selection of the best measures to be used and the computation of the models, and (iii) to automate the evaluation of the interest of the models computed. First we propose a syntax and a semantics for <span>predict</span> and discuss how enhanced cubes are built by (i) predicting the missing values for a measure based on the available information via one or more models and (ii) highlighting the most interesting prediction. Then we test the operator implementation, proving that its performance is in line with the interactivity requirement of OLAP session and that accurate predictions can be returned.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102628"},"PeriodicalIF":3.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-13DOI: 10.1016/j.is.2025.102625
Yongqing Li , Qimeng Yang , Long Yu , ShengWei Tian , Xin Fan
Graph Contrastive Learning (GCL) enhances recommender systems by leveraging Graph Neural Networks (GNNs) and self-supervised learning (SSL). However, existing methods struggle with data sparsity and noise. We propose Robust Graph Contrastive Learning (RoGCL), a novel framework that generates high-quality contrastive views through dual-perspective generators. The local generator employs Variational Graph Autoencoders (VGAE) to capture micro-level collaborative patterns by sampling from user–item interaction distributions. The global generator utilizes Singular Value Decomposition (SVD) to reconstruct macro-level structures while filtering noise through low-rank approximation. By incorporating Information Bottleneck (InfoBN) to minimize redundancy between views, RoGCL learns robust representations combining local and global collaborative signals. Extensive experiments on Last.FM, Yelp, and BeerAdvocate datasets demonstrate that RoGCL significantly outperforms state-of-the-art methods including Self-supervised Graph Learning (SGL), Neural Collaborative Learning (NCL), and Adaptive Graph Contrastive Learning (AdaGCL). Results show improved Recall@20 by up to 8.7% and NDCG@20 by 5.8% compared to best baselines. Notably, RoGCL exhibits exceptional robustness, maintaining over 90% relative performance with 25% noise injection and showing 37.7% improvement for sparse user groups, making it particularly suitable for real-world applications with imperfect data.
{"title":"Robust Graph Contrastive Learning for recommender systems: Addressing data sparsity and noise","authors":"Yongqing Li , Qimeng Yang , Long Yu , ShengWei Tian , Xin Fan","doi":"10.1016/j.is.2025.102625","DOIUrl":"10.1016/j.is.2025.102625","url":null,"abstract":"<div><div>Graph Contrastive Learning (GCL) enhances recommender systems by leveraging Graph Neural Networks (GNNs) and self-supervised learning (SSL). However, existing methods struggle with data sparsity and noise. We propose Robust Graph Contrastive Learning (RoGCL), a novel framework that generates high-quality contrastive views through dual-perspective generators. The local generator employs Variational Graph Autoencoders (VGAE) to capture micro-level collaborative patterns by sampling from user–item interaction distributions. The global generator utilizes Singular Value Decomposition (SVD) to reconstruct macro-level structures while filtering noise through low-rank approximation. By incorporating Information Bottleneck (InfoBN) to minimize redundancy between views, RoGCL learns robust representations combining local and global collaborative signals. Extensive experiments on Last.FM, Yelp, and BeerAdvocate datasets demonstrate that RoGCL significantly outperforms state-of-the-art methods including Self-supervised Graph Learning (SGL), Neural Collaborative Learning (NCL), and Adaptive Graph Contrastive Learning (AdaGCL). Results show improved Recall@20 by up to 8.7% and NDCG@20 by 5.8% compared to best baselines. Notably, RoGCL exhibits exceptional robustness, maintaining over 90% relative performance with 25% noise injection and showing 37.7% improvement for sparse user groups, making it particularly suitable for real-world applications with imperfect data.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102625"},"PeriodicalIF":3.4,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}