特征强化学习:第二部分。结构化mdp

Journal of Artificial General Intelligence Pub Date : 2021-01-01 DOI:10.2478/jagi-2021-0003

Marcus Hutter

{"title":"特征强化学习:第二部分。结构化mdp","authors":"Marcus Hutter","doi":"10.2478/jagi-2021-0003","DOIUrl":null,"url":null,"abstract":"Abstract The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.","PeriodicalId":247142,"journal":{"name":"Journal of Artificial General Intelligence","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature Reinforcement Learning: Part II. Structured MDPs\",\"authors\":\"Marcus Hutter\",\"doi\":\"10.2478/jagi-2021-0003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.\",\"PeriodicalId\":247142,\"journal\":{\"name\":\"Journal of Artificial General Intelligence\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial General Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jagi-2021-0003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial General Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jagi-2021-0003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

第一部分(Hutter, 2009b)中开发的特征马尔可夫决策过程(mdp)模型非常适合于一般环境中的学习代理。然而，非结构化(Φ) mdp仅限于相对简单的环境。像动态贝叶斯网络(dbn)这样的结构化mdp用于解决大规模的现实问题。在本文中，我将ΦMDP扩展为ΦDBN。主要贡献是派生出一个成本标准，该标准允许从环境中自动提取最相关的特征，从而产生“最佳”DBN表示。我讨论了一个完整的通用学习算法所需的所有构建块，并将新的ΦDBN模型与流行的POMDP方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature Reinforcement Learning: Part II. Structured MDPs

Abstract The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Artificial General Intelligence

自引率

0.00%

发文量

期刊最新文献

Fuzzy Networks for Modeling Shared Semantic Knowledge Extending Environments to Measure Self-reflection in Reinforcement Learning Measuring Intelligence and Growth Rate: Variations on Hibbard’s Intelligence Measure Feature Reinforcement Learning: Part II. Structured MDPs The Synthesis and Decoding of Meaning