Feature Reinforcement Learning: Part II. Structured MDPs

Journal of Artificial General Intelligence Pub Date : 2021-01-01 DOI:10.2478/jagi-2021-0003

Marcus Hutter

引用次数: 0

Abstract

Abstract The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

特征强化学习:第二部分。结构化mdp

第一部分(Hutter, 2009b)中开发的特征马尔可夫决策过程(mdp)模型非常适合于一般环境中的学习代理。然而，非结构化(Φ) mdp仅限于相对简单的环境。像动态贝叶斯网络(dbn)这样的结构化mdp用于解决大规模的现实问题。在本文中，我将ΦMDP扩展为ΦDBN。主要贡献是派生出一个成本标准，该标准允许从环境中自动提取最相关的特征，从而产生“最佳”DBN表示。我讨论了一个完整的通用学习算法所需的所有构建块，并将新的ΦDBN模型与流行的POMDP方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Artificial General Intelligence

自引率

0.00%

发文量

期刊最新文献

Fuzzy Networks for Modeling Shared Semantic Knowledge Extending Environments to Measure Self-reflection in Reinforcement Learning Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure Feature Reinforcement Learning: Part II. Structured MDPs The Synthesis and Decoding of Meaning