Artificial Intelligence最新文献_第9页

A differentiable first-order rule learner for inductive logic programming 用于归纳逻辑编程的可微分一阶规则学习器

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-03-15 DOI: 10.1016/j.artint.2024.104108

Kun Gao , Katsumi Inoue , Yongzhi Cao , Hanpin Wang

Learning first-order logic programs from relational facts yields intuitive insights into the data. Inductive logic programming (ILP) models are effective in learning first-order logic programs from observed relational data. Symbolic ILP models support rule learning in a data-efficient manner. However, symbolic ILP models are not robust to learn from noisy data. Neuro-symbolic ILP models utilize neural networks to learn logic programs in a differentiable manner which improves the robustness of ILP models. However, most neuro-symbolic methods need a strong language bias to learn logic programs, which reduces the usability and flexibility of ILP models and limits the logic program formats. In addition, most neuro-symbolic ILP methods cannot learn logic programs effectively from both small-size datasets and large-size datasets such as knowledge graphs. In the paper, we introduce a novel differentiable ILP model called differentiable first-order rule learner (DFORL), which is scalable to learn rules from both smaller and larger datasets. Besides, DFORL only needs the number of variables in the learned logic programs as input. Hence, DFORL is easy to use and does not need a strong language bias. We demonstrate that DFORL can perform well on several standard ILP datasets, knowledge graphs, and probabilistic relation facts and outperform several well-known differentiable ILP models. Experimental results indicate that DFORL is a precise, robust, scalable, and computationally cheap differentiable ILP model.

从关系事实中学习一阶逻辑程序可以获得对数据的直观见解。归纳逻辑编程（ILP）模型能有效地从观察到的关系数据中学习一阶逻辑程序。符号 ILP 模型以数据高效的方式支持规则学习。然而，符号 ILP 模型在从嘈杂数据中学习时并不稳定。神经符号 ILP 模型利用神经网络以可微分的方式学习逻辑程序，从而提高了 ILP 模型的鲁棒性。然而，大多数神经符号方法需要强烈的语言偏向来学习逻辑程序，这降低了 ILP 模型的可用性和灵活性，并限制了逻辑程序的格式。此外，大多数神经符号 ILP 方法无法从小型数据集和大型数据集（如知识图谱）中有效地学习逻辑程序。在本文中，我们介绍了一种新颖的可微分 ILP 模型--可微分一阶规则学习器（DFORL），它具有可扩展性，既能从较小的数据集学习规则，也能从较大的数据集学习规则。此外，DFORL 只需要将所学逻辑程序中的变量数量作为输入。因此，DFORL 易于使用，而且不需要强烈的语言倾向。我们证明，DFORL 可以在多个标准 ILP 数据集、知识图谱和概率关系事实上表现出色，并优于多个著名的可微分 ILP 模型。实验结果表明，DFORL 是一种精确、稳健、可扩展且计算成本低廉的可微分 ILP 模型。

{"title":"A differentiable first-order rule learner for inductive logic programming","authors":"Kun Gao , Katsumi Inoue , Yongzhi Cao , Hanpin Wang","doi":"10.1016/j.artint.2024.104108","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104108","url":null,"abstract":"<div><p>Learning first-order logic programs from relational facts yields intuitive insights into the data. Inductive logic programming (ILP) models are effective in learning first-order logic programs from observed relational data. Symbolic ILP models support rule learning in a data-efficient manner. However, symbolic ILP models are not robust to learn from noisy data. Neuro-symbolic ILP models utilize neural networks to learn logic programs in a differentiable manner which improves the robustness of ILP models. However, most neuro-symbolic methods need a strong language bias to learn logic programs, which reduces the usability and flexibility of ILP models and limits the logic program formats. In addition, most neuro-symbolic ILP methods cannot learn logic programs effectively from both small-size datasets and large-size datasets such as knowledge graphs. In the paper, we introduce a novel differentiable ILP model called differentiable first-order rule learner (DFORL), which is scalable to learn rules from both smaller and larger datasets. Besides, DFORL only needs the number of variables in the learned logic programs as input. Hence, DFORL is easy to use and does not need a strong language bias. We demonstrate that DFORL can perform well on several standard ILP datasets, knowledge graphs, and probabilistic relation facts and outperform several well-known differentiable ILP models. Experimental results indicate that DFORL is a precise, robust, scalable, and computationally cheap differentiable ILP model.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"331 ","pages":"Article 104108"},"PeriodicalIF":14.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140190964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-deterministic approximation fixpoint theory and its application in disjunctive logic programming 非确定性近似定点理论及其在断据逻辑编程中的应用

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-03-08 DOI: 10.1016/j.artint.2024.104110

Jesse Heyninck , Ofer Arieli , Bart Bogaerts

Approximation fixpoint theory (AFT) is an abstract and general algebraic framework for studying the semantics of nonmonotonic logics. It provides a unifying study of the semantics of different formalisms for nonmonotonic reasoning, such as logic programming, default logic and autoepistemic logic. In this paper, we extend AFT to dealing with non-deterministic constructs that allow to handle indefinite information, represented e.g. by disjunctive formulas. This is done by generalizing the main constructions and corresponding results of AFT to non-deterministic operators, whose ranges are sets of elements rather than single elements. The applicability and usefulness of this generalization is illustrated in the context of disjunctive logic programming.

近似定点理论（AFT）是研究非单调逻辑语义的一个抽象而通用的代数框架。它为非单调推理的不同形式主义（如逻辑编程、缺省逻辑和自显逻辑）提供了统一的语义研究。在本文中，我们将 AFT 扩展到处理非确定性构造，这些构造允许处理不确定的信息，例如，用分界公式表示的信息。这是通过将 AFT 的主要构造和相应结果推广到非确定性算子来实现的，非确定性算子的范围是元素集而不是单个元素。我们将以非定式逻辑编程为背景，说明这种概括的适用性和实用性。

引用次数: 0

“Guess what I'm doing”: Extending legibility to sequential decision tasks "猜猜我在做什么"：将可读性扩展到顺序决策任务

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-03-07 DOI: 10.1016/j.artint.2024.104107

Miguel Faria , Francisco S. Melo , Ana Paiva

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoLMDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several scenarios of varying complexity. We also showcase the use of our legible policies as demonstrations in machine teaching scenarios, establishing their superiority in teaching new behaviours against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study, where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.

在本文中，我们研究了不确定性条件下顺序决策任务中的可读性概念。以往将可读性扩展到机器人运动以外场景的研究，要么侧重于确定性设置，要么计算成本过高。我们提出的方法被称为 PoLMDP，能够处理不确定性，同时保持计算上的可操作性。我们在几个复杂度不同的场景中确立了我们的方法与最先进方法相比的优势。我们还展示了在机器教学场景中使用我们的可读策略作为示范，与常用的基于最优策略的示范相比，我们的可读策略在教授新行为方面更具优势。最后，我们通过一项用户研究来评估我们计算出的策略的可读性，在这项研究中，人们被要求通过观察移动机器人的行动来推断其遵循可读策略的目标。

引用次数: 0

aspmc: New frontiers of algebraic answer set counting aspmc：代数答案集计数的新领域

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-03-06 DOI: 10.1016/j.artint.2024.104109

Thomas Eiter , Markus Hecher , Rafael Kiesel

In the last decade, there has been increasing interest in extensions of answer set programming (ASP) that cater for quantitative information such as weights or probabilities. A wide range of quantitative reasoning tasks for ASP and logic programming, among them probabilistic inference and parameter learning in the neuro-symbolic setting, can be expressed as algebraic answer set counting (AASC) tasks, i.e., weighted model counting for ASP with weights calculated over some semiring, which makes efficient solvers for AASC desirable. In this article, we present

, a new solver for AASC that pushes the limits of efficient solvability. Notably,

provides improved performance compared to the state of the art in probabilistic inference by exploiting three insights gained from thorough theoretical investigations in our work. Namely, we consider the knowledge compilation step in the AASC pipeline, where the underlying logical theory specified by the answer set program is converted into a tractable circuit representation, on which AASC is feasible in polynomial time. First, we provide a detailed comparison of different approaches to knowledge compilation for programs, revealing that translation to propositional formulas followed by compilation to sd-DNNF seems favorable. Second, we study how the translation to propositional formulas should proceed to result in efficient compilation. This leads to the second and third insight, namely a novel way of breaking the positive cyclic dependencies in a program, called

T_{P}

-Unfolding, and an improvement to the Clark Completion, the procedure used to transform programs without positive cyclic dependencies into propositional formulas. Both improvements are tailored towards efficient knowledge compilation. Our empirical evaluation reveals that while all three advancements contribute to the success of

,

T_{P}

-Unfolding improves performance significantly by allowing us to handle cyclic instances better.

近十年来，人们对答案集编程（ASP）的扩展越来越感兴趣，因为它能满足权重或概率等定量信息的要求。ASP 和逻辑编程的大量定量推理任务，其中包括神经符号环境中的概率推理和参数学习，都可以表示为代数答案集计数（AASC）任务，即 ASP 的加权模型计数，其权重是通过某些半序列计算得出的，这就使得 AASC 的高效求解器变得非常理想。在本文中，我们提出了一种新的 AASC 求解器 ▪，它突破了高效求解的极限。值得注意的是，▪ 通过利用我们的工作中从深入理论研究中获得的三点启示，提供了比现有概率推理更高的性能。也就是说，我们考虑了 AASC 流水线中的知识编译步骤，在这个步骤中，由答案集程序指定的底层逻辑理论被转换成一个可处理的电路表示，在这个电路表示上，AASC 在多项式时间内是可行的。首先，我们对程序知识编译的不同方法进行了详细比较，发现先转换为命题公式，再编译为 sd-DNNF 似乎更有利。其次，我们研究了翻译为命题公式时应如何进行才能实现高效编译。这引出了第二和第三个见解，即打破程序中正向循环依赖关系的新方法（称为 "折叠"），以及对克拉克完成（Clark Completion）的改进，克拉克完成是用于将无正向循环依赖关系的程序转换为命题式的程序。这两项改进都是为了实现高效的知识编译而量身定制的。我们的实证评估显示，虽然所有这三项改进都有助于 ▪ 的成功，但 "折叠 "能让我们更好地处理循环实例，从而显著提高了性能。

{"title":"aspmc: New frontiers of algebraic answer set counting","authors":"Thomas Eiter , Markus Hecher , Rafael Kiesel","doi":"10.1016/j.artint.2024.104109","DOIUrl":"10.1016/j.artint.2024.104109","url":null,"abstract":"<div><p>In the last decade, there has been increasing interest in extensions of answer set programming (ASP) that cater for quantitative information such as weights or probabilities. A wide range of quantitative reasoning tasks for ASP and logic programming, among them probabilistic inference and parameter learning in the neuro-symbolic setting, can be expressed as algebraic answer set counting (AASC) tasks, i.e., weighted model counting for ASP with weights calculated over some semiring, which makes efficient solvers for AASC desirable. In this article, we present <figure><img></figure>, a new solver for AASC that pushes the limits of efficient solvability. Notably, <figure><img></figure> provides improved performance compared to the state of the art in probabilistic inference by exploiting three insights gained from thorough theoretical investigations in our work. Namely, we consider the knowledge compilation step in the AASC pipeline, where the underlying logical theory specified by the answer set program is converted into a tractable circuit representation, on which AASC is feasible in polynomial time. First, we provide a detailed comparison of different approaches to knowledge compilation for programs, revealing that translation to propositional formulas followed by compilation to sd-DNNF seems favorable. Second, we study how the translation to propositional formulas should proceed to result in efficient compilation. This leads to the second and third insight, namely a novel way of breaking the positive cyclic dependencies in a program, called <span><math><msub><mrow><mi>T</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span>-Unfolding, and an improvement to the Clark Completion, the procedure used to transform programs without positive cyclic dependencies into propositional formulas. Both improvements are tailored towards efficient knowledge compilation. Our empirical evaluation reveals that while all three advancements contribute to the success of <figure><img></figure>, <span><math><msub><mrow><mi>T</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span>-Unfolding improves performance significantly by allowing us to handle cyclic instances better.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104109"},"PeriodicalIF":14.4,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000456/pdfft?md5=978459678153434f8adde349d6b98000&pid=1-s2.0-S0004370224000456-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140053605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the properties of neural network representations in reinforcement learning 研究强化学习中神经网络表征的特性

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-03-01 DOI: 10.1016/j.artint.2024.104100

Han Wang , Erfan Miahi , Martha White , Marlos C. Machado , Zaheer Abbas , Raksha Kumaraswamy , Vincent Liu , Adam White

In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation—good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25,000 agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfers across Atari 2600 game modes.

在本文中，我们研究了深度强化学习系统学习到的表征的特性。早期有关强化学习表征的工作大多集中在设计固定基础架构，以实现人们认为理想的属性，如正交性和稀疏性。相比之下，深度强化学习方法背后的理念是，代理设计者不应编码表征属性，而应由数据流决定表征属性--好的表征会在适当的训练方案下出现。在本文中，我们将这两种观点结合在一起，通过实证研究支持强化学习中转移的表征属性。我们在超过 25,000 个代理任务设置中引入并测量了六种表征属性。我们考虑了在基于像素的导航环境中具有不同辅助损失的深度 Q 学习代理，源任务和转移任务对应于不同的目标位置。我们开发了一种方法，通过系统化的方法来改变任务相似性，测量表征属性并将其与转移性能相关联，从而更好地理解为什么某些表征对转移效果更好。我们通过研究成功在 Atari 2600 游戏模式中转移的 Rainbow 代理所学习的表征，证明了该方法的通用性。

{"title":"Investigating the properties of neural network representations in reinforcement learning","authors":"Han Wang , Erfan Miahi , Martha White , Marlos C. Machado , Zaheer Abbas , Raksha Kumaraswamy , Vincent Liu , Adam White","doi":"10.1016/j.artint.2024.104100","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104100","url":null,"abstract":"<div><p>In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation—good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25,000 agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand <em>why</em> some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfers across Atari 2600 game modes.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104100"},"PeriodicalIF":14.4,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000365/pdfft?md5=6e885307a80c3c36ff25f169599f1f61&pid=1-s2.0-S0004370224000365-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140051710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation 交叉能保证进化多目标优化的指数级速度提升

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-02-27 DOI: 10.1016/j.artint.2024.104098

Duc-Cuong Dang, Andre Opris, Dirk Sudholt

Evolutionary algorithms are popular algorithms for multi-objective optimisation (also called Pareto optimisation) as they use a population to store trade-offs between different objectives. Despite their popularity, the theoretical foundation of multi-objective evolutionary optimisation (EMO) is still in its early development. Fundamental questions such as the benefits of the crossover operator are still not fully understood. We provide a theoretical analysis of the well-known EMO algorithms GSEMO and NSGA-II to showcase the possible advantages of crossover: we propose classes of “royal road” functions on which these algorithms cover the whole Pareto front in expected polynomial time if crossover is being used. But when disabling crossover, they require exponential time in expectation to cover the Pareto front. The latter even holds for a large class of black-box algorithms using any elitist selection and any unbiased mutation operator. Moreover, even the expected time to create a single Pareto-optimal search point is exponential. We provide two different function classes, one tailored for one-point crossover and another one tailored for uniform crossover, and we show that some immune-inspired hypermutations cannot avoid exponential optimisation times. Our work shows the first example of an exponential performance gap through the use of crossover for the widely used NSGA-II algorithm and contributes to a deeper understanding of its limitations and capabilities.

进化算法是多目标优化（也称帕累托优化）的常用算法，因为它们使用群体来存储不同目标之间的权衡。尽管进化算法很受欢迎，但多目标进化优化（EMO）的理论基础仍处于早期发展阶段。诸如交叉算子的益处等基本问题仍未得到充分理解。我们对著名的多目标进化优化算法 GSEMO 和 NSGA-II 进行了理论分析，以展示交叉的可能优势：我们提出了 "皇道 "函数的类别，在这些函数上，如果使用交叉，这些算法可以在预期的多项式时间内覆盖整个帕累托前沿。但如果不使用交叉，它们需要指数级的预期时间才能覆盖帕累托前沿。后者甚至适用于使用任何精英选择和无偏突变算子的一大类黑盒算法。此外，即使创建一个帕累托最优搜索点的预期时间也是指数级的。我们提供了两种不同的函数类别，一种是为单点交叉量身定制的，另一种是为均匀交叉量身定制的，我们还证明了一些受免疫启发的超突变无法避免指数级优化时间。我们的研究首次展示了通过对广泛使用的 NSGA-II 算法使用交叉而产生指数级性能差距的实例，有助于加深对其局限性和能力的理解。

{"title":"Crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation","authors":"Duc-Cuong Dang, Andre Opris, Dirk Sudholt","doi":"10.1016/j.artint.2024.104098","DOIUrl":"10.1016/j.artint.2024.104098","url":null,"abstract":"<div><p>Evolutionary algorithms are popular algorithms for multi-objective optimisation (also called Pareto optimisation) as they use a population to store trade-offs between different objectives. Despite their popularity, the theoretical foundation of multi-objective evolutionary optimisation (EMO) is still in its early development. Fundamental questions such as the benefits of the crossover operator are still not fully understood. We provide a theoretical analysis of the well-known EMO algorithms GSEMO and NSGA-II to showcase the possible advantages of crossover: we propose classes of “royal road” functions on which these algorithms cover the whole Pareto front in expected polynomial time if crossover is being used. But when disabling crossover, they require exponential time in expectation to cover the Pareto front. The latter even holds for a large class of black-box algorithms using any elitist selection and any unbiased mutation operator. Moreover, even the expected time to create a single Pareto-optimal search point is exponential. We provide two different function classes, one tailored for one-point crossover and another one tailored for uniform crossover, and we show that some immune-inspired hypermutations cannot avoid exponential optimisation times. Our work shows the first example of an exponential performance gap through the use of crossover for the widely used NSGA-II algorithm and contributes to a deeper understanding of its limitations and capabilities.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104098"},"PeriodicalIF":14.4,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000341/pdfft?md5=3dde17723b9eeb078cc6d03de8fc345f&pid=1-s2.0-S0004370224000341-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139994334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Datalog rewritability and data complexity of ALCHOIQ with closed predicates 带封闭谓词的 ALCHOIQ 的数据模型可重写性和数据复杂性

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-02-23 DOI: 10.1016/j.artint.2024.104099

Sanja Lukumbuzya , Magdalena Ortiz , Mantas Šimkus

We study the relative expressiveness of ontology-mediated queries (OMQs) formulated in the expressive Description Logic $ALCHOIQ$ extended with closed predicates. In particular, we present a polynomial time translation from OMQs into Datalog with negation under the stable model semantics, the formalism that underlies Answer Set Programming. This is a novel and non-trivial result: the considered OMQs are not only non-monotonic, but also feature a tricky combination of nominals, inverse roles, and counting. We start with atomic queries and then lift our approach to a large class of first-order queries where quantification is “guarded” by closed predicates. Our translation is based on a characterization of the query answering problem via integer programming, and a specially crafted program in Datalog with negation that finds solutions to dynamically generated systems of integer inequalities. As an important by-product of our translation we get that the query answering problem is co-NP-complete in data complexity for the considered class of OMQs. Thus, answering these OMQs in the presence of closed predicates is not harder than answering them in the standard setting. This is not obvious as closed predicates are known to increase data complexity for some existing ontology languages.

我们研究了本体中介查询（OMQs）的相对表达能力，这些查询是用封闭谓词扩展的表达式描述逻辑 ALCHOIQ 提出的。特别是，我们提出了在稳定模型语义（支撑答案集编程的形式主义）下将 OMQ 转换为带否定的 Datalog 的多项式时间。这是一个新颖而非难的结果：所考虑的 OMQs 不仅是非单调的，而且还具有提名、反向角色和计数的棘手组合。我们从原子查询开始，然后将我们的方法推广到一大类一阶查询，在这些查询中，量化被封闭谓词 "保护 "着。我们的转换是基于通过整数编程对查询回答问题的描述，以及在 Datalog 中专门设计的带有否定的程序，该程序可以找到动态生成的整数不等式系统的解决方案。作为翻译的一个重要副产品，我们发现对于所考虑的这一类 OMQs，查询回答问题在数据复杂度上是共 NP 完备的。因此，在存在封闭谓词的情况下回答这些 OMQs 并不比在标准设置下回答它们更难。这一点并不明显，因为已知封闭谓词会增加某些现有本体语言的数据复杂度。

{"title":"Datalog rewritability and data complexity of ALCHOIQ with closed predicates","authors":"Sanja Lukumbuzya , Magdalena Ortiz , Mantas Šimkus","doi":"10.1016/j.artint.2024.104099","DOIUrl":"10.1016/j.artint.2024.104099","url":null,"abstract":"<div><p>We study the relative expressiveness of ontology-mediated queries (OMQs) formulated in the expressive Description Logic <span><math><mi>ALCHOIQ</mi></math></span> extended with closed predicates. In particular, we present a polynomial time translation from OMQs into Datalog with negation under the stable model semantics, the formalism that underlies Answer Set Programming. This is a novel and non-trivial result: the considered OMQs are not only non-monotonic, but also feature a tricky combination of nominals, inverse roles, and counting. We start with atomic queries and then lift our approach to a large class of first-order queries where quantification is “guarded” by closed predicates. Our translation is based on a characterization of the query answering problem via integer programming, and a specially crafted program in Datalog with negation that finds solutions to dynamically generated systems of integer inequalities. As an important by-product of our translation we get that the query answering problem is co-NP-complete in data complexity for the considered class of OMQs. Thus, answering these OMQs in the presence of closed predicates is not harder than answering them in the standard setting. This is not obvious as closed predicates are known to increase data complexity for some existing ontology languages.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104099"},"PeriodicalIF":14.4,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000353/pdfft?md5=592b8d7cf49fb907ac6ec3932a493a97&pid=1-s2.0-S0004370224000353-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139943365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization 通过贝叶斯风险估计和最小化在线寻找最佳勘探开发权衡方案

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-02-21 DOI: 10.1016/j.artint.2024.104096

Stewart Jamieson , Jonathan P. How , Yogesh Girdhar

We propose endogenous Bayesian risk minimization (EBRM) over policy sets as an approach to online learning across a wide range of settings. Many real-world online learning problems have complexities such as action- and belief-dependent rewards, time-discounting of reward, and heterogeneous costs for actions and feedback; we find that existing online learning heuristics cannot leverage most problem-specific information, to the detriment of their performance. We introduce a belief-space Markov decision process (BMDP) model that can capture these complexities, and further apply the concepts of aleatoric, epistemic, and process risks to online learning. These risk functions describe the risk inherent to the learning problem, the risk due to the agent's lack of knowledge, and the relative quality of its policy, respectively. We demonstrate how computing and minimizing these risk functions guides the online learning agent towards the optimal exploration-exploitation trade-off in any stochastic online learning problem, constituting the basis of the EBRM approach. We also show how Bayes' risk, the minimization objective in stochastic online learning problems, can be decomposed into the aforementioned aleatoric, epistemic, and process risks.

In simulation experiments, EBRM algorithms achieve state-of-the-art performance across various classical online learning problems, including Gaussian and Bernoulli multi-armed bandits, best-arm identification, mixed objectives with action- and belief-dependent rewards, and dynamic pricing, a finite partial monitoring problem. To our knowledge, it is also the first computationally efficient online learning approach that can provide online bounds on an algorithm's Bayes' risk. Finally, because the EBRM approach is parameterized by a set of policy algorithms, it can be extended to incorporate new developments in online learning algorithms, and is thus well-suited as the foundation for developing real-world learning agents.

我们提出了通过策略集进行在线学习的方法（EBRM）。现实世界中的许多在线学习问题都具有复杂性，例如与行动和信念相关的奖励、奖励的时间折扣以及行动和反馈的异质成本；我们发现，现有的在线学习启发式方法无法利用大多数特定问题的信息，从而影响了其性能。为此，我们引入了能捕捉这些复杂性的信念空间马尔可夫决策过程（BMDP）模型，并进一步将、和风险的概念应用于在线学习。这些风险函数分别描述了学习问题的难度、代理的知识状况及其策略的质量。我们展示了在任何随机在线学习问题中，计算和最小化这些风险函数是如何引导在线学习代理实现最优探索-开发权衡的，这构成了 EBRM 方法的基础。我们还展示了如何将随机在线学习问题中的最小化目标--贝叶斯风险分解为上述可知风险、认识风险和过程风险。

{"title":"Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization","authors":"Stewart Jamieson , Jonathan P. How , Yogesh Girdhar","doi":"10.1016/j.artint.2024.104096","DOIUrl":"10.1016/j.artint.2024.104096","url":null,"abstract":"<div><p>We propose <em>endogenous Bayesian risk minimization</em> (EBRM) over policy sets as an approach to online learning across a wide range of settings. Many real-world online learning problems have complexities such as action- and belief-dependent rewards, time-discounting of reward, and heterogeneous costs for actions and feedback; we find that existing online learning heuristics cannot leverage most problem-specific information, to the detriment of their performance. We introduce a belief-space Markov decision process (BMDP) model that can capture these complexities, and further apply the concepts of <em>aleatoric</em>, <em>epistemic</em>, and <em>process</em> risks to online learning. These risk functions describe the risk inherent to the learning problem, the risk due to the agent's lack of knowledge, and the relative quality of its policy, respectively. We demonstrate how computing and minimizing these risk functions guides the online learning agent towards the optimal exploration-exploitation trade-off in any stochastic online learning problem, constituting the basis of the EBRM approach. We also show how Bayes' risk, the minimization objective in stochastic online learning problems, can be decomposed into the aforementioned aleatoric, epistemic, and process risks.</p><p>In simulation experiments, EBRM algorithms achieve state-of-the-art performance across various classical online learning problems, including Gaussian and Bernoulli multi-armed bandits, best-arm identification, mixed objectives with action- and belief-dependent rewards, and dynamic pricing, a finite partial monitoring problem. To our knowledge, it is also the first computationally efficient online learning approach that can provide online bounds on an algorithm's Bayes' risk. Finally, because the EBRM approach is parameterized by a set of policy algorithms, it can be extended to incorporate new developments in online learning algorithms, and is thus well-suited as the foundation for developing real-world learning agents.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104096"},"PeriodicalIF":14.4,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139937811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized planning as heuristic search: A new planning search-space that leverages pointers over objects 作为启发式搜索的通用规划：利用对象指针的新规划搜索空间

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-02-15 DOI: 10.1016/j.artint.2024.104097

Javier Segovia-Aguas , Sergio Jiménez , Anders Jonsson

Planning as heuristic search is one of the most successful approaches to classical planning but unfortunately, it does not trivially extend to Generalized Planning (GP); GP aims to compute algorithmic solutions that are valid for a set of classical planning instances from a given domain, even if these instances differ in their number of objects, the initial and goal configuration of these objects and hence, in the number (and possible values) of the state variables. State-space search, as it is implemented by heuristic planners, becomes then impractical for GP. In this paper we adapt the planning as heuristic search paradigm to the generalization requirements of GP, and present the first native heuristic search approach to GP. First, the paper introduces a new pointer-based solution space for GP that is independent of the number of classical planning instances in a GP problem and the size of those instances (i.e. the number of objects, state variables and their domain sizes). Second, the paper defines an upgraded version of our GP algorithm, called Best-First Generalized Planning (BFGP), that implements a best-first search in our pointer-based solution space for GP. Lastly, the paper defines a set of evaluation and heuristic functions for BFGP that assess the structural complexity of the candidate GP solutions, as well as their fitness to a given input set of classical planning instances. The computation of these evaluation and heuristic functions does not require grounding states or actions in advance. Therefore our GP as heuristic search approach can handle large sets of state variables with large numerical domains, e.g. integers.

作为启发式搜索的规划是经典规划最成功的方法之一，但遗憾的是，它并不能简单地扩展到广义规划（GP）；GP 的目标是计算对给定领域中一组经典规划实例有效的算法解决方案，即使这些实例在对象数量、这些对象的初始和目标配置以及状态变量的数量（和可能值）方面存在差异。因此，由启发式规划器实现的状态空间搜索对于 GP 来说是不切实际的。在本文中，我们将规划作为启发式搜索范例，以适应 GP 的泛化要求，并首次提出了 GP 的本地启发式搜索方法。首先，本文为 GP 引入了一个新的基于指针的求解空间，它与 GP 问题中经典规划实例的数量以及这些实例的大小（即对象、状态变量的数量及其域大小）无关。其次，本文定义了 GP 算法的升级版，称为最佳优先通用规划（BFGP），它在我们基于指针的 GP 解空间中实现了最佳优先搜索。最后，本文为 BFGP 定义了一组评估和启发式函数，用于评估候选 GP 解决方案的结构复杂性，以及它们对给定输入经典规划实例集的适应性。这些评估和启发式函数的计算不需要预先建立状态或行动基础。因此，我们的 GP 启发式搜索方法可以处理具有较大数值域（如整数）的大型状态变量集。

{"title":"Generalized planning as heuristic search: A new planning search-space that leverages pointers over objects","authors":"Javier Segovia-Aguas , Sergio Jiménez , Anders Jonsson","doi":"10.1016/j.artint.2024.104097","DOIUrl":"10.1016/j.artint.2024.104097","url":null,"abstract":"<div><p><em>Planning as heuristic search</em> is one of the most successful approaches to <em>classical planning</em> but unfortunately, it does not trivially extend to <em>Generalized Planning</em> (GP); GP aims to compute algorithmic solutions that are valid for a set of classical planning instances from a given domain, even if these instances differ in their number of objects, the initial and goal configuration of these objects and hence, in the number (and possible values) of the state variables. <em>State-space search</em>, as it is implemented by heuristic planners, becomes then impractical for GP. In this paper we adapt the <em>planning as heuristic search</em> paradigm to the generalization requirements of GP, and present the first native heuristic search approach to GP. First, the paper introduces a new pointer-based solution space for GP that is independent of the number of classical planning instances in a GP problem and the size of those instances (i.e. the number of objects, state variables and their domain sizes). Second, the paper defines an upgraded version of our GP algorithm, called <em>Best-First Generalized Planning</em> (<span>BFGP</span>), that implements a <em>best-first search</em> in our pointer-based solution space for GP. Lastly, the paper defines a set of evaluation and heuristic functions for <span>BFGP</span> that assess the structural complexity of the candidate GP solutions, as well as their fitness to a given input set of classical planning instances. The computation of these evaluation and heuristic functions does not require grounding states or actions in advance. Therefore our <em>GP as heuristic search</em> approach can handle large sets of state variables with large numerical domains, e.g. <em>integers</em>.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"330 ","pages":"Article 104097"},"PeriodicalIF":14.4,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S000437022400033X/pdfft?md5=155e9598fb32d57f910c200397c5f020&pid=1-s2.0-S000437022400033X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139916222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decentralized fused-learner architectures for Bayesian reinforcement learning 贝叶斯强化学习的分散融合学习器架构

IF 14.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-02-13 DOI: 10.1016/j.artint.2024.104094

Augustin A. Saucan , Subhro Das , Moe Z. Win

Decentralized training is a robust solution for learning over an extensive network of distributed agents. Many existing solutions involve the averaging of locally inferred parameters which constrain the architecture to independent agents with identical learning algorithms. Here, we propose decentralized fused-learner architectures for Bayesian reinforcement learning, named fused Bayesian-learner architectures (FBLAs), that are capable of learning an optimal policy by fusing potentially heterogeneous Bayesian policy gradient learners, i.e., agents that employ different learning architectures to estimate the gradient of a control policy. The novelty of FBLAs relies on fusing the full posterior distributions of the local policy gradients. The inclusion of higher-order information, i.e., probabilistic uncertainty, is employed to robustly fuse the locally-trained parameters. FBLAs find the barycenter of all local posterior densities by minimizing the total Kullback–Leibler divergence from the barycenter distribution to the local posterior densities. The proposed FBLAs are demonstrated on a sensor-selection problem for Bernoulli tracking, where multiple sensors observe a dynamic target and only a subset of sensors is allowed to be active at any time.

分散式训练是在广泛的分布式代理网络中进行学习的稳健解决方案。现有的许多解决方案都涉及局部推断参数的平均化，这就将架构限制为具有相同学习算法的独立代理。在这里，我们提出了用于贝叶斯强化学习的分散式融合学习器架构，并将其命名为融合贝叶斯学习器架构（FBLAs），它能够通过融合潜在的异构贝叶斯策略梯度学习器（即采用不同学习架构来估计控制策略梯度的代理）来学习最优策略。贝叶斯策略梯度学习器的新颖之处在于融合了局部策略梯度的完整后验分布。将高阶信息（即概率不确定性）纳入其中，可稳健地融合局部训练参数。FBLA 通过最小化从原点分布到局部后验密度的总库尔贝-莱布勒发散，找到所有局部后验密度的原点。我们在伯努利跟踪的传感器选择问题上演示了所提出的 FBLA，在该问题中，多个传感器观察一个动态目标，而在任何时候都只允许一个传感器子集处于活动状态。

{"title":"Decentralized fused-learner architectures for Bayesian reinforcement learning","authors":"Augustin A. Saucan , Subhro Das , Moe Z. Win","doi":"10.1016/j.artint.2024.104094","DOIUrl":"10.1016/j.artint.2024.104094","url":null,"abstract":"<div><p>Decentralized training is a robust solution for learning over an extensive network of distributed agents. Many existing solutions involve the averaging of locally inferred parameters which constrain the architecture to independent agents with identical learning algorithms. Here, we propose decentralized fused-learner architectures for Bayesian reinforcement learning, named fused Bayesian-learner architectures (FBLAs), that are capable of learning an optimal policy by fusing potentially heterogeneous Bayesian policy gradient learners, i.e., agents that employ different learning architectures to estimate the gradient of a control policy. The novelty of FBLAs relies on fusing the full posterior distributions of the local policy gradients. The inclusion of higher-order information, i.e., probabilistic uncertainty, is employed to robustly fuse the locally-trained parameters. FBLAs find the barycenter of all local posterior densities by minimizing the total Kullback–Leibler divergence from the barycenter distribution to the local posterior densities. The proposed FBLAs are demonstrated on a sensor-selection problem for Bernoulli tracking, where multiple sensors observe a dynamic target and only a subset of sensors is allowed to be active at any time.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"331 ","pages":"Article 104094"},"PeriodicalIF":14.4,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139889913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0