Transferable dynamics models for efficient object-oriented reinforcement learning

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2024-01-26 DOI:10.1016/j.artint.2024.104079

Ofir Marom, Benjamin Rosman

{"title":"Transferable dynamics models for efficient object-oriented reinforcement learning","authors":"Ofir Marom, Benjamin Rosman","doi":"10.1016/j.artint.2024.104079","DOIUrl":null,"url":null,"abstract":"<div><p>The Reinforcement Learning (RL) framework offers a general paradigm for constructing autonomous agents that can make effective decisions when solving tasks. An important area of study within the field of RL is transfer learning, where an agent utilizes knowledge gained from solving previous tasks to solve a new task more efficiently. While the notion of transfer learning is conceptually appealing, in practice, not all RL representations are amenable to transfer learning. Moreover, much of the research on transfer learning in RL is purely empirical. Previous research has shown that object-oriented representations are suitable for the purposes of transfer learning with theoretical efficiency guarantees. Such representations leverage the notion of object classes to learn lifted rules that apply to grounded object instantiations. In this paper, we extend previous research on object-oriented representations and introduce two formalisms: the first is based on deictic predicates, and is used to learn a transferable transition dynamics model; the second is based on propositions, and is used to learn a transferable reward dynamics model. In addition, we extend previously introduced efficient learning algorithms for object-oriented representations to our proposed formalisms. Our frameworks are then combined into a single efficient algorithm that learns transferable transition and reward dynamics models across a domain of related tasks. We illustrate our proposed algorithm empirically on an extended version of the Taxi domain, as well as the more difficult Sokoban domain, showing the benefits of our approach with regards to efficient learning and transfer.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"329 ","pages":"Article 104079"},"PeriodicalIF":5.1000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000158/pdfft?md5=f71414a73bce8f86455910c6a084672b&pid=1-s2.0-S0004370224000158-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224000158","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Reinforcement Learning (RL) framework offers a general paradigm for constructing autonomous agents that can make effective decisions when solving tasks. An important area of study within the field of RL is transfer learning, where an agent utilizes knowledge gained from solving previous tasks to solve a new task more efficiently. While the notion of transfer learning is conceptually appealing, in practice, not all RL representations are amenable to transfer learning. Moreover, much of the research on transfer learning in RL is purely empirical. Previous research has shown that object-oriented representations are suitable for the purposes of transfer learning with theoretical efficiency guarantees. Such representations leverage the notion of object classes to learn lifted rules that apply to grounded object instantiations. In this paper, we extend previous research on object-oriented representations and introduce two formalisms: the first is based on deictic predicates, and is used to learn a transferable transition dynamics model; the second is based on propositions, and is used to learn a transferable reward dynamics model. In addition, we extend previously introduced efficient learning algorithms for object-oriented representations to our proposed formalisms. Our frameworks are then combined into a single efficient algorithm that learns transferable transition and reward dynamics models across a domain of related tasks. We illustrate our proposed algorithm empirically on an extended version of the Taxi domain, as well as the more difficult Sokoban domain, showing the benefits of our approach with regards to efficient learning and transfer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高效面向对象强化学习的可转移动力学模型

强化学习（RL）框架为构建能在解决任务时做出有效决策的自主代理提供了一种通用范式。强化学习领域的一个重要研究领域是迁移学习，即代理利用从解决以前任务中获得的知识，更高效地解决新任务。虽然迁移学习的概念很吸引人，但在实践中，并非所有的 RL 表征都适合迁移学习。此外，有关 RL 中迁移学习的研究大多纯属经验之谈。以往的研究表明，面向对象的表示法适合迁移学习的目的，并有理论上的效率保证。这类表征利用对象类的概念来学习适用于基础对象实例的提升规则。在本文中，我们扩展了之前关于面向对象表征的研究，并引入了两种形式主义：第一种形式主义基于谓词，用于学习可迁移的过渡动力学模型；第二种形式主义基于命题，用于学习可迁移的奖励动力学模型。此外，我们还将之前介绍的面向对象表征的高效学习算法扩展到了我们提出的形式主义中。然后，我们将这些框架组合成一个单一的高效算法，在相关任务领域学习可迁移的过渡和奖励动态模型。我们在扩展版的出租车领域以及难度更大的推箱子领域对我们提出的算法进行了实证说明，显示了我们的方法在高效学习和迁移方面的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.

期刊最新文献

Lifted action models learning from partial traces Human-AI coevolution Editorial Board Separate but equal: Equality in belief propagation for single-cycle graphs Generative models for grid-based and image-based pathfinding