首页 > 最新文献

Information Systems最新文献

英文 中文
Substring compression variations and LZ78-Derivates 子串压缩变化和lz78派生
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-25 DOI: 10.1016/j.is.2025.102553
Dominik Köppl
We propose algorithms computing the semi-greedy Lempel–Ziv 78 (LZ78), the Lempel–Ziv Double (LZD), and the Lempel–Ziv–Miller–Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data structures that can be constructed in linear time, which can solve the substring compression problems for these factorizations in time linear in the output size. For substring compression, we give the first results for lexparse and closed factorizations.
我们提出了半贪婪Lempel-Ziv 78 (LZ78)、Lempel-Ziv Double (LZD)和线性时间整数字母的Lempel-Ziv - miller - wegman (LZMW)分解算法。对于LZD和LZMW,我们还提出了可以在线性时间内构建的数据结构,这可以解决这些分解的子串压缩问题,这些分解在输出大小上是线性的。对于子字符串压缩,我们给出了lexparse和闭分解的第一个结果。
{"title":"Substring compression variations and LZ78-Derivates","authors":"Dominik Köppl","doi":"10.1016/j.is.2025.102553","DOIUrl":"10.1016/j.is.2025.102553","url":null,"abstract":"<div><div>We propose algorithms computing the semi-greedy Lempel–Ziv 78 (LZ78), the Lempel–Ziv Double (LZD), and the Lempel–Ziv–Miller–Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data structures that can be constructed in linear time, which can solve the substring compression problems for these factorizations in time linear in the output size. For substring compression, we give the first results for lexparse and closed factorizations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102553"},"PeriodicalIF":3.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to resolve inconsistencies in qualitative constraint networks 学习解决定性约束网络中的不一致性
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-18 DOI: 10.1016/j.is.2025.102557
Anastasia Paparrizou, Michael Sioutis
In this paper, we present a reinforcement learning approach for resolving inconsistencies in qualitative constraint networks (QCNs). QCNs are typically used in constraint programming to represent and reason about intuitive spatial or temporal relations like x {is inside of overlaps} y. Naturally, QCNs are not immune to uncertainty, noise, or imperfect data that may be present in information, and thus, more often than not, they are hampered by inconsistencies. We propose a multi-armed bandit approach that defines a well-suited ordering of constraints for finding a maximal satisfiable subset of them. Specifically, our learning approach interacts with a solver, and after each trial a reward is returned to measure the performance of the selected action (constraint addition). The reward function is based on the reduction of the solution space of a consistent reconstruction of the input QCN. Experimental results with different bandit policies and various rewards that are obtained by our algorithm suggest that we can do better than the state of the art in terms of both effectiveness, viz., lower number of repairs obtained for an inconsistent QCN, and efficiency, viz., faster runtime.
在本文中,我们提出了一种强化学习方法来解决定性约束网络(QCNs)中的不一致性。QCNs通常在约束规划中用于表示和推理直观的空间或时间关系,如x{在重叠的内部}。自然地,QCNs不能免受信息中可能存在的不确定性、噪声或不完美数据的影响,因此,它们往往受到不一致性的阻碍。我们提出了一种多臂强盗方法,该方法定义了一个非常适合的约束排序,以寻找它们的最大可满足子集。具体来说,我们的学习方法与求解器交互,每次尝试后都会返回奖励来衡量所选动作的表现(约束添加)。奖励函数基于输入QCN的一致重构的解空间的约简。使用不同的强盗策略和我们的算法获得的各种奖励的实验结果表明,我们可以在有效性(即对不一致的QCN获得更少的修复次数)和效率(即更快的运行时间)方面做得比目前的技术水平更好。
{"title":"Learning to resolve inconsistencies in qualitative constraint networks","authors":"Anastasia Paparrizou,&nbsp;Michael Sioutis","doi":"10.1016/j.is.2025.102557","DOIUrl":"10.1016/j.is.2025.102557","url":null,"abstract":"<div><div>In this paper, we present a reinforcement learning approach for resolving inconsistencies in qualitative constraint networks (<span><math><mi>QCN</mi></math></span>s). <span><math><mi>QCN</mi></math></span>s are typically used in constraint programming to represent and reason about intuitive spatial or temporal relations like <em>x</em> {<em>is inside of</em> <span><math><mo>∨</mo></math></span> <em>overlaps</em>} <em>y</em>. Naturally, <span><math><mi>QCN</mi></math></span>s are not immune to uncertainty, noise, or imperfect data that may be present in information, and thus, more often than not, they are hampered by inconsistencies. We propose a multi-armed bandit approach that defines a well-suited ordering of constraints for finding a maximal satisfiable subset of them. Specifically, our learning approach interacts with a solver, and after each trial a reward is returned to measure the performance of the selected action (constraint addition). The reward function is based on the reduction of the solution space of a consistent reconstruction of the input <span><math><mi>QCN</mi></math></span>. Experimental results with different bandit policies and various rewards that are obtained by our algorithm suggest that we can do better than the state of the art in terms of both effectiveness, viz., lower number of repairs obtained for an inconsistent <span><math><mi>QCN</mi></math></span>, and efficiency, viz., faster runtime.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102557"},"PeriodicalIF":3.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental checking of SQL assertions in an RDBMS RDBMS中SQL断言的增量检查
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-16 DOI: 10.1016/j.is.2025.102550
Xavier Oriol, Ernest Teniente
The notion of SQL assertion was introduced, in SQL-92 standard, to define general constraints over a relational database. They can be used, for instance, to specify cross-row constraints or multitable check constraints. However, up to now, none of the current relational database management systems (RDBMSs) support SQL assertions due to the difficulty of providing an efficient solution.
To implement SQL assertions efficiently, the RDBMs require an incremental checking mechanism. I.e., given an assertion, the RDBMS should revalidate it only when a transaction changes data in a manner that could violate it, and only for the affected data. Some years ago, the deductive database community provided several incremental checking methods, however, their results could not get into practice in RDBMS.
In this paper, we propose an approach to efficiently implement SQL assertions in an RDBMS through an incremental revalidation technique. Such an approach is compatible with any RDBMS since it is fully based on standard SQL concepts (tables, triggers, and procedures). Our proposal uses and extends the Event Rules, an existing proposal for incremental checking in deductive databases. This extension is required to handle distributive aggregates, which pushes the expressiveness of the handled SQL assertions beyond first-order constraints. Moreover, we exploit this extension to improve the treatment of constraints involving existential variables, which are a very common kind of constraints difficult and expensive to handle. Finally, we show the efficiency of our approach through some experiments, and we formally prove its soundness and completeness.
SQL-92标准中引入了SQL断言的概念,用于定义关系数据库上的一般约束。例如,它们可用于指定跨行约束或多表检查约束。然而,到目前为止,由于难以提供有效的解决方案,当前的关系数据库管理系统(rdbms)都不支持SQL断言。为了有效地实现SQL断言,RDBMs需要一种增量检查机制。例如,给定一个断言,只有当事务以可能违反断言的方式更改数据时,RDBMS才应该重新验证它,并且只针对受影响的数据。几年前,演绎数据库社区提供了几种增量检查方法,但其结果无法在RDBMS中应用。在本文中,我们提出了一种通过增量重新验证技术在RDBMS中有效实现SQL断言的方法。这种方法与任何RDBMS兼容,因为它完全基于标准SQL概念(表、触发器和过程)。我们的建议使用并扩展了Event Rules,这是一个用于在演绎数据库中进行增量检查的现有建议。这个扩展需要处理分布式聚合,这将处理的SQL断言的表达性推到一阶约束之外。此外,我们利用这个扩展来改进涉及存在变量的约束的处理,存在变量是一种非常常见的约束,处理起来既困难又昂贵。最后,通过实验证明了该方法的有效性,并正式证明了该方法的合理性和完备性。
{"title":"Incremental checking of SQL assertions in an RDBMS","authors":"Xavier Oriol,&nbsp;Ernest Teniente","doi":"10.1016/j.is.2025.102550","DOIUrl":"10.1016/j.is.2025.102550","url":null,"abstract":"<div><div>The notion of SQL assertion was introduced, in SQL-92 standard, to define general constraints over a relational database. They can be used, for instance, to specify cross-row constraints or multitable check constraints. However, up to now, none of the current relational database management systems (RDBMSs) support SQL assertions due to the difficulty of providing an efficient solution.</div><div>To implement SQL assertions efficiently, the RDBMs require an incremental checking mechanism. I.e., given an assertion, the RDBMS should revalidate it only when a transaction changes data in a manner that could violate it, and only for the affected data. Some years ago, the deductive database community provided several <em>incremental checking</em> methods, however, their results could not get into practice in RDBMS.</div><div>In this paper, we propose an approach to efficiently implement SQL assertions in an RDBMS through an incremental revalidation technique. Such an approach is compatible with any RDBMS since it is fully based on standard SQL concepts (tables, triggers, and procedures). Our proposal uses and extends <em>the Event Rules</em>, an existing proposal for incremental checking in deductive databases. This extension is required to handle distributive aggregates, which pushes the expressiveness of the handled SQL assertions beyond first-order constraints. Moreover, we exploit this extension to improve the treatment of constraints involving existential variables, which are a very common kind of constraints difficult and expensive to handle. Finally, we show the efficiency of our approach through some experiments, and we formally prove its soundness and completeness.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102550"},"PeriodicalIF":3.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-accuracy unsupervised statistical learning method for joint dangling entity detection and entity alignment 一种用于关节悬垂实体检测和实体对齐的高精度无监督统计学习方法
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-11 DOI: 10.1016/j.is.2025.102554
Cong Xu , Mengxin Shi , Xiang Gao , Zhongkang Yin , Xiujuan Yao , Wei Li , Jiasen Yang
Dangling entities are common in knowledge graphs but there is a lack of research on entity alignment involving them. Most existing studies leverage neural network methods through supervised learning. However, these data-driven methods suffer from poor interpretability and high computation overhead. In this paper, we propose a Simple Unsupervised Dangling entity detection and entity Alignment method (SUDA)1 without employing neural networks. Our method consists of three modules: entity embedding, dangling entity detection, and entity alignment. While the state-of-the-art Simple but Effective Unsupervised entity alignment method (SEU)2 is incapable of dealing with dangling entities, SUDA further extends it and addresses the bilateral dangling entities problem. Theoretical proof of our method is given. We also design a new adjacent matrix for incorporating richer entity relations. Then we construct entity similarity outlier intervals to detect dangling entities and align entities through assignment problem after removing them. Extensive experiments demonstrate that our method outperforms those supervised and unsupervised methods. Additionally, in the entity alignment tasks, SUDA consumes less runtime compared to neural network methods, while maintaining high efficiency, interpretability, and stability. Code is available at https://github.com/skyccong/SUDA.git.
悬空实体在知识图谱中很常见,但关于悬空实体对齐的研究较少。大多数现有研究通过监督学习利用神经网络方法。然而,这些数据驱动的方法存在可解释性差和计算开销高的问题。在本文中,我们提出了一种不使用神经网络的简单无监督悬垂实体检测和实体对齐方法(SUDA)1。我们的方法包括三个模块:实体嵌入、悬空实体检测和实体对齐。虽然最先进的简单而有效的无监督实体对齐方法(SEU)2无法处理悬空实体,但SUDA进一步扩展了它并解决了双边悬空实体问题。给出了该方法的理论证明。我们还设计了一个新的相邻矩阵,以纳入更丰富的实体关系。然后构造实体相似度离群区间来检测悬空实体,并通过去除悬空实体后的赋值问题对悬空实体进行对齐。大量的实验表明,我们的方法优于那些有监督和无监督的方法。此外,在实体对齐任务中,与神经网络方法相比,SUDA消耗的运行时间更少,同时保持了高效率、可解释性和稳定性。代码可从https://github.com/skyccong/SUDA.git获得。
{"title":"A high-accuracy unsupervised statistical learning method for joint dangling entity detection and entity alignment","authors":"Cong Xu ,&nbsp;Mengxin Shi ,&nbsp;Xiang Gao ,&nbsp;Zhongkang Yin ,&nbsp;Xiujuan Yao ,&nbsp;Wei Li ,&nbsp;Jiasen Yang","doi":"10.1016/j.is.2025.102554","DOIUrl":"10.1016/j.is.2025.102554","url":null,"abstract":"<div><div>Dangling entities are common in knowledge graphs but there is a lack of research on entity alignment involving them. Most existing studies leverage neural network methods through supervised learning. However, these data-driven methods suffer from poor interpretability and high computation overhead. In this paper, we propose a Simple Unsupervised Dangling entity detection and entity Alignment method (SUDA)<span><span><sup>1</sup></span></span> without employing neural networks. Our method consists of three modules: entity embedding, dangling entity detection, and entity alignment. While the state-of-the-art Simple but Effective Unsupervised entity alignment method (SEU)<span><span><sup>2</sup></span></span> is incapable of dealing with dangling entities, SUDA further extends it and addresses the bilateral dangling entities problem. Theoretical proof of our method is given. We also design a new adjacent matrix for incorporating richer entity relations. Then we construct entity similarity outlier intervals to detect dangling entities and align entities through assignment problem after removing them. Extensive experiments demonstrate that our method outperforms those supervised and unsupervised methods. Additionally, in the entity alignment tasks, SUDA consumes less runtime compared to neural network methods, while maintaining high efficiency, interpretability, and stability. Code is available at <span><span>https://github.com/skyccong/SUDA.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102554"},"PeriodicalIF":3.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143838186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical aspect-based sentiment analysis using semantic capsuled multi-granular networks 基于语义封装的多颗粒网络分层面向情感分析
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-04-03 DOI: 10.1016/j.is.2025.102556
Jeffin Gracewell , A. Arul Edwin Raj , C.T. Kalaivani , Renugadevi R
In the ever-evolving domain of sentiment analysis, discerning intricate sentiments towards specific aspects and their sub-components within textual data has become pivotal. This paper introduces the Semantic Capsuled Hierarchical Multi-Granular Network (SCH-MGN) model, an innovative approach explicitly designed for aspect-based sentiment analysis (ABSA) challenges. The SCH-MGN model is primed to evaluate sentiments at both macro (broader topics) and micro (detailed sub-aspects) hierarchical levels, offering a comprehensive sentiment evaluation spectrum. By integrating mechanisms like the Semantic Knowledge Graph Attention Network (SKG-AN) for targeted aspect extraction, Hierarchical Embedding Layers leveraging Multilingual BERT (mBERT), and advanced neural architectures including Recurrent Neural Networks (RNNs) and Temporal Convolutional Networks (TCNs), the model ensures a nuanced sentiment interpretation. The paper provides a meticulous dissection of the model's methodology, from tokenization and embedding to detailed sentiment extraction, accentuating its capability to offer granular sentiment interpretations. Empirical illustrations validate the model's proficiency in handling compound sentiments, cementing its potential as an indispensable tool for businesses, reviewers, and analysts. This groundbreaking approach to ABSA promises to redefine the granularity with which we understand and evaluate textual sentiments in diverse domains.
在不断发展的情感分析领域,识别文本数据中针对特定方面及其子组件的复杂情感已变得至关重要。本文介绍了语义封装分层多颗粒网络(SCH-MGN)模型,这是一种专门为基于方面的情感分析(ABSA)挑战而设计的创新方法。SCH-MGN模型准备在宏观(更广泛的主题)和微观(详细的子方面)层次水平上评估情绪,提供全面的情绪评估谱。通过集成用于目标方面提取的语义知识图注意网络(SKG-AN)、利用多语言BERT (mBERT)的分层嵌入层以及包括循环神经网络(rnn)和时间卷积网络(tcn)在内的高级神经架构等机制,该模型确保了细致入微的情绪解释。本文对模型的方法进行了细致的剖析,从标记化和嵌入到详细的情感提取,强调了其提供粒度情感解释的能力。经验例证验证了该模型在处理复合情绪方面的熟练程度,巩固了其作为业务、审阅者和分析师不可或缺的工具的潜力。这种开创性的ABSA方法有望重新定义我们在不同领域中理解和评估文本情感的粒度。
{"title":"Hierarchical aspect-based sentiment analysis using semantic capsuled multi-granular networks","authors":"Jeffin Gracewell ,&nbsp;A. Arul Edwin Raj ,&nbsp;C.T. Kalaivani ,&nbsp;Renugadevi R","doi":"10.1016/j.is.2025.102556","DOIUrl":"10.1016/j.is.2025.102556","url":null,"abstract":"<div><div>In the ever-evolving domain of sentiment analysis, discerning intricate sentiments towards specific aspects and their sub-components within textual data has become pivotal. This paper introduces the Semantic Capsuled Hierarchical Multi-Granular Network (SCH-MGN) model, an innovative approach explicitly designed for aspect-based sentiment analysis (ABSA) challenges. The SCH-MGN model is primed to evaluate sentiments at both macro (broader topics) and micro (detailed sub-aspects) hierarchical levels, offering a comprehensive sentiment evaluation spectrum. By integrating mechanisms like the Semantic Knowledge Graph Attention Network (SKG-AN) for targeted aspect extraction, Hierarchical Embedding Layers leveraging Multilingual BERT (mBERT), and advanced neural architectures including Recurrent Neural Networks (RNNs) and Temporal Convolutional Networks (TCNs), the model ensures a nuanced sentiment interpretation. The paper provides a meticulous dissection of the model's methodology, from tokenization and embedding to detailed sentiment extraction, accentuating its capability to offer granular sentiment interpretations. Empirical illustrations validate the model's proficiency in handling compound sentiments, cementing its potential as an indispensable tool for businesses, reviewers, and analysts. This groundbreaking approach to ABSA promises to redefine the granularity with which we understand and evaluate textual sentiments in diverse domains.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102556"},"PeriodicalIF":3.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special Issue: Best Papers of the BPM 2023 Conference 特刊:BPM 2023会议最佳论文
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-27 DOI: 10.1016/j.is.2025.102552
Manfred Reichert, Andrea Burattin, Chiara Di Francescomarino, Christian Janiesch
{"title":"Special Issue: Best Papers of the BPM 2023 Conference","authors":"Manfred Reichert,&nbsp;Andrea Burattin,&nbsp;Chiara Di Francescomarino,&nbsp;Christian Janiesch","doi":"10.1016/j.is.2025.102552","DOIUrl":"10.1016/j.is.2025.102552","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102552"},"PeriodicalIF":3.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144212812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When GDD meets GNN: A knowledge-driven neural connection for effective entity resolution in property graphs 当GDD遇到GNN时:一种知识驱动的神经连接,用于属性图中有效的实体解析
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-22 DOI: 10.1016/j.is.2025.102551
Junwei Hu , Michael Bewong , Selasi Kwashie , Yidi Zhang , Vincent Nofong , John Wondoh , Zaiwen Feng
This paper studies the entity resolution (ER) problem in property graphs. ER is the task of identifying and linking different records that refer to the same real-world entity. It is commonly used in data integration, data cleansing, and other applications where it is important to have accurate and consistent data. In general, two predominant approaches exist in the literature: rule-based and learning-based methods. On the one hand, rule-based techniques are often desired due to their explainability and ability to encode domain knowledge. Learning-based methods, on the other hand, are preferred due to their effectiveness in spite of their black-box nature. In this work, we devise a hybrid ER solution, GraphER, that leverages the strengths of both systems for property graphs. In particular, we adopt graph differential dependency (GDD) for encoding the so-called record-matching rules, and employ them to guide a graph neural network (GNN) based representation learning for the task. We conduct extensive empirical evaluation of our proposal on benchmark ER datasets including 17 graph datasets and 7 relational datasets in comparison with 10 state-of-the-art (SOTA) techniques. The results show that our approach provides a significantly better solution to addressing ER in graph data, both quantitatively and qualitatively, while attaining highly competitive results on the benchmark relational datasets w.r.t. the SOTA solutions.
研究了属性图中的实体解析问题。ER的任务是识别和链接引用相同现实世界实体的不同记录。它通常用于数据集成、数据清理和其他需要准确和一致数据的应用程序中。一般来说,文献中存在两种主要的方法:基于规则的方法和基于学习的方法。一方面,基于规则的技术由于其可解释性和编码领域知识的能力而经常被需要。另一方面,基于学习的方法由于其有效性而受到青睐,尽管它们具有黑箱性质。在这项工作中,我们设计了一个混合ER解决方案,graph,它利用了两个系统的优势来处理属性图。特别地,我们采用图微分依赖(GDD)来编码所谓的记录匹配规则,并利用它们来指导基于图神经网络(GNN)的任务表示学习。我们对基准ER数据集进行了广泛的实证评估,其中包括17个图数据集和7个关系数据集,并与10个最先进的(SOTA)技术进行了比较。结果表明,我们的方法提供了一个更好的解决方案来处理图数据中的ER,无论是定量的还是定性的,同时在基准关系数据集上获得了与SOTA解决方案相比具有很强竞争力的结果。
{"title":"When GDD meets GNN: A knowledge-driven neural connection for effective entity resolution in property graphs","authors":"Junwei Hu ,&nbsp;Michael Bewong ,&nbsp;Selasi Kwashie ,&nbsp;Yidi Zhang ,&nbsp;Vincent Nofong ,&nbsp;John Wondoh ,&nbsp;Zaiwen Feng","doi":"10.1016/j.is.2025.102551","DOIUrl":"10.1016/j.is.2025.102551","url":null,"abstract":"<div><div>This paper studies the entity resolution (ER) problem in property graphs. ER is the task of identifying and linking different records that refer to the same real-world entity. It is commonly used in data integration, data cleansing, and other applications where it is important to have accurate and consistent data. In general, two predominant approaches exist in the literature: rule-based and learning-based methods. On the one hand, rule-based techniques are often desired due to their explainability and ability to encode domain knowledge. Learning-based methods, on the other hand, are preferred due to their effectiveness in spite of their black-box nature. In this work, we devise a hybrid ER solution, <span>GraphER</span>, that leverages the strengths of both systems for property graphs. In particular, we adopt <em>graph differential dependency</em> (GDD) for encoding the so-called <em>record-matching rules</em>, and employ them to guide a graph neural network (GNN) based representation learning for the task. We conduct extensive empirical evaluation of our proposal on benchmark ER datasets including 17 graph datasets and 7 relational datasets in comparison with 10 state-of-the-art (SOTA) techniques. The results show that our approach provides a significantly better solution to addressing ER in graph data, both quantitatively and qualitatively, while attaining highly competitive results on the benchmark relational datasets <em>w.r.t.</em> the SOTA solutions.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102551"},"PeriodicalIF":3.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A JSON document algebra for query optimization 用于查询优化的JSON文档代数
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-19 DOI: 10.1016/j.is.2025.102537
Tomas Llano-Rios , Mohamed Khalefa , Antonio Badia
Due to the popularity of JSON, several systems have been developed that store data in collections of JSON documents. Each system has developed its own query language, sometimes in an ad-hoc manner. This makes difficult to formally define and analyze query optimization techniques. We propose an algebra tailored to JSON documents. First, we argue that JSON is different from nested relations and XML and therefore requires its own solution. Then, we propose an algebra on 3 levels: the first level defines operators to manipulate individual documents, providing an abstraction over different serializations. The second level provides operators over collections of JSON documents, while the third level defines also collection operators which are not primitive, but that enable direct and efficient implementation of data manipulation operations. We provide a number of properties of the algebraic operators which provide a solid basis for query optimization.
由于JSON的流行,已经开发了一些将数据存储在JSON文档集合中的系统。每个系统都开发了自己的查询语言,有时采用特别的方式。这使得很难正式定义和分析查询优化技术。我们提出了一种针对JSON文档的代数。首先,我们认为JSON不同于嵌套关系和XML,因此需要它自己的解决方案。然后,我们在3层上提出一个代数:第一级定义操作符来操作单个文档,提供对不同序列化的抽象。第二层提供JSON文档集合上的操作符,而第三层还定义了集合操作符,这些操作符不是原始的,但可以直接有效地实现数据操作操作。我们提供了一些代数运算符的属性,为查询优化提供了坚实的基础。
{"title":"A JSON document algebra for query optimization","authors":"Tomas Llano-Rios ,&nbsp;Mohamed Khalefa ,&nbsp;Antonio Badia","doi":"10.1016/j.is.2025.102537","DOIUrl":"10.1016/j.is.2025.102537","url":null,"abstract":"<div><div>Due to the popularity of JSON, several systems have been developed that store data in collections of JSON documents. Each system has developed its own query language, sometimes in an ad-hoc manner. This makes difficult to formally define and analyze query optimization techniques. We propose an algebra tailored to JSON documents. First, we argue that JSON is different from nested relations and XML and therefore requires its own solution. Then, we propose an algebra on 3 levels: the first level defines operators to manipulate individual documents, providing an abstraction over different serializations. The second level provides operators over collections of JSON documents, while the third level defines also collection operators which are not primitive, but that enable direct and efficient implementation of data manipulation operations. We provide a number of properties of the algebraic operators which provide a solid basis for query optimization.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102537"},"PeriodicalIF":3.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effects of data quality on machine learning performance on tabular data 数据质量对表数据机器学习性能的影响
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-14 DOI: 10.1016/j.is.2025.102549
Sedir Mohammed , Lukas Budach , Moritz Feuerpfeil , Nina Ihde , Andrea Nathansen , Nele Noack , Hendrik Patzlaff , Felix Naumann , Hazar Harmouch
Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example, incomplete, erroneous, or inappropriate training data can lead to unreliable models that produce ultimately poor decisions. Trustworthy AI applications require high-quality training and test data along many quality dimensions, such as accuracy, completeness, and consistency.
We explore empirically the relationship between six data quality dimensions and the performance of 19 popular machine learning algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. Our experiments distinguish three scenarios based on the AI pipeline steps that were fed with polluted data: polluted training data, test data, or both. We conclude the paper with an extensive discussion of our observations.
现代人工智能(AI)应用需要大量的训练和测试数据。这一需求不仅对这种数据的可得性,而且对其质量都构成了严峻的挑战。例如,不完整、错误或不适当的训练数据可能导致不可靠的模型,最终产生糟糕的决策。值得信赖的人工智能应用程序需要高质量的训练和测试数据,以及许多质量维度,如准确性、完整性和一致性。我们从经验上探讨了六个数据质量维度与19种流行的机器学习算法的性能之间的关系,这些算法涵盖了分类、回归和聚类等任务,目的是解释它们在数据质量方面的性能。我们的实验根据人工智能管道步骤区分了三种场景,这些步骤被污染的数据提供:污染的训练数据,测试数据,或两者兼而有之。最后,我们对我们的观察结果进行了广泛的讨论。
{"title":"The effects of data quality on machine learning performance on tabular data","authors":"Sedir Mohammed ,&nbsp;Lukas Budach ,&nbsp;Moritz Feuerpfeil ,&nbsp;Nina Ihde ,&nbsp;Andrea Nathansen ,&nbsp;Nele Noack ,&nbsp;Hendrik Patzlaff ,&nbsp;Felix Naumann ,&nbsp;Hazar Harmouch","doi":"10.1016/j.is.2025.102549","DOIUrl":"10.1016/j.is.2025.102549","url":null,"abstract":"<div><div>Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example, incomplete, erroneous, or inappropriate training data can lead to unreliable models that produce ultimately poor decisions. Trustworthy AI applications require high-quality training and test data along many quality dimensions, such as accuracy, completeness, and consistency.</div><div>We explore empirically the relationship between six data quality dimensions and the performance of 19 popular machine learning algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. Our experiments distinguish three scenarios based on the AI pipeline steps that were fed with polluted data: polluted training data, test data, or both. We conclude the paper with an extensive discussion of our observations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102549"},"PeriodicalIF":3.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Process mining over sensor data: Goal recognition for powered transhumeral prostheses 传感器数据的过程挖掘:动力肱骨假体的目标识别
IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-03-06 DOI: 10.1016/j.is.2025.102540
Zihang Su , Tianshi Yu , Artem Polyvyanyy , Ying Tan , Nir Lipovetzky , Sebastian Sardiña , Nick van Beest , Alireza Mohammadi , Denny Oetomo
Process mining (PM)-based goal recognition (GR) techniques, which infer goals or targets based on sequences of observed actions, have shown efficacy in real-world engineering applications. This study explores the applicability of PM-based GR in identifying target poses for users employing powered transhumeral prosthetics. These prosthetics are designed to restore missing anatomical segments below the shoulder, including the hand. In this article, we aim to apply the GR techniques to identify the intended movements of users, enabling the motors on the powered transhumeral prosthesis to execute the desired motions precisely. In this way, a powered transhumeral prosthesis can assist individuals with disabilities in completing movement tasks. PM-based GR techniques were initially designed to infer goals from sequences of observed actions, where discrete event names represent actions. However, the electromyography electrodes and kinematic sensors on powered transhumeral prosthetic devices register sequences of continuous, real-valued data measurements. Therefore, we rely on methods to transform sensor data into discrete events and integrate these methods with the PM-based GR system to develop target pose recognition approaches. Two data transformation approaches are introduced. The first approach relies on the clustering of data measurements collected before the target pose is reached (the clustering approach). The second approach uses the time series of measurements collected while the dynamic user movement to perform linear discriminant analysis (LDA) classification and identify discrete events (the dynamic LDA approach). These methods are evaluated through offline and human-in-the-loop (online) experiments and compared with established techniques, such as static LDA, an LDA classification based on data collected at static target poses, and GR approaches based on neural networks. Real-time human-in-the-loop experiments further validate the effectiveness of the proposed methods, demonstrating that PM-based GR using the dynamic LDA classifier achieves superior F1 score and balanced accuracy compared to state-of-the-art techniques.
基于过程挖掘(PM)的目标识别(GR)技术,根据观察到的动作序列推断目标或目标,已经在实际工程应用中显示出有效性。本研究探讨了基于pm的GR在使用动力肱骨假体的用户识别目标姿势中的适用性。这些义肢被设计用来修复肩部以下缺失的解剖节段,包括手。在本文中,我们的目标是应用GR技术来识别用户的预期动作,使动力肱骨假体上的电机能够精确地执行所需的动作。通过这种方式,动力肱骨假体可以帮助残疾人完成运动任务。基于pm的GR技术最初旨在从观察到的动作序列中推断目标,其中离散的事件名称代表动作。然而,动力肱骨外假体装置上的肌电电极和运动学传感器记录连续的实值数据测量序列。因此,我们依靠将传感器数据转换为离散事件的方法,并将这些方法与基于pm的GR系统相结合,开发目标姿态识别方法。介绍了两种数据转换方法。第一种方法依赖于在达到目标姿势之前收集的数据测量的聚类(聚类方法)。第二种方法使用动态用户移动时收集的测量时间序列来执行线性判别分析(LDA)分类并识别离散事件(动态LDA方法)。通过离线和在线实验对这些方法进行了评估,并与现有技术(如静态LDA、基于静态目标姿态收集的数据的LDA分类和基于神经网络的GR方法)进行了比较。实时人在环实验进一步验证了所提出方法的有效性,表明与最先进的技术相比,使用动态LDA分类器的基于pm的GR获得了更高的F1分数和平衡精度。
{"title":"Process mining over sensor data: Goal recognition for powered transhumeral prostheses","authors":"Zihang Su ,&nbsp;Tianshi Yu ,&nbsp;Artem Polyvyanyy ,&nbsp;Ying Tan ,&nbsp;Nir Lipovetzky ,&nbsp;Sebastian Sardiña ,&nbsp;Nick van Beest ,&nbsp;Alireza Mohammadi ,&nbsp;Denny Oetomo","doi":"10.1016/j.is.2025.102540","DOIUrl":"10.1016/j.is.2025.102540","url":null,"abstract":"<div><div>Process mining (PM)-based goal recognition (GR) techniques, which infer goals or targets based on sequences of observed actions, have shown efficacy in real-world engineering applications. This study explores the applicability of PM-based GR in identifying target poses for users employing powered transhumeral prosthetics. These prosthetics are designed to restore missing anatomical segments below the shoulder, including the hand. In this article, we aim to apply the GR techniques to identify the intended movements of users, enabling the motors on the powered transhumeral prosthesis to execute the desired motions precisely. In this way, a powered transhumeral prosthesis can assist individuals with disabilities in completing movement tasks. PM-based GR techniques were initially designed to infer goals from sequences of observed actions, where discrete event names represent actions. However, the electromyography electrodes and kinematic sensors on powered transhumeral prosthetic devices register sequences of continuous, real-valued data measurements. Therefore, we rely on methods to transform sensor data into discrete events and integrate these methods with the PM-based GR system to develop target pose recognition approaches. Two data transformation approaches are introduced. The first approach relies on the clustering of data measurements collected before the target pose is reached (the clustering approach). The second approach uses the time series of measurements collected while the dynamic user movement to perform linear discriminant analysis (LDA) classification and identify discrete events (the dynamic LDA approach). These methods are evaluated through offline and human-in-the-loop (online) experiments and compared with established techniques, such as static LDA, an LDA classification based on data collected at static target poses, and GR approaches based on neural networks. Real-time human-in-the-loop experiments further validate the effectiveness of the proposed methods, demonstrating that PM-based GR using the dynamic LDA classifier achieves superior <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score and balanced accuracy compared to state-of-the-art techniques.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102540"},"PeriodicalIF":3.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1