A Non-parametric Approach to Approximate Dynamic Programming

2011 10th International Conference on Machine Learning and Applications and Workshops Pub Date : 2011-12-18 DOI:10.1109/ICMLA.2011.19

Hadrien Glaude, Fadi Akrimi, M. Geist, O. Pietquin

引用次数: 2

Abstract

Approximate Dynamic Programming (ADP) is a machine learning method aiming at learning an optimal control policy for a dynamic and stochastic system from a logged set of observed interactions between the system and one or several non-optimal controlers. It defines a class of particular Reinforcement Learning (RL) algorithms which is a general paradigm for learning such a control policy from interactions. ADP addresses the problem of systems exhibiting a state space which is too large to be enumerated in the memory of a computer. Because of this, approximation schemes are used to generalize estimates over continuous state spaces. Nevertheless, RL still suffers from a lack of scalability to multidimensional continuous state spaces. In this paper, we propose the use of the Locally Weighted Projection Regression (LWPR) method to handle this scalability problem. We prove the efficacy of our approach on two standard benchmarks modified to exhibit larger state spaces.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

近似动态规划的一种非参数方法

近似动态规划(ADP)是一种机器学习方法，旨在从系统与一个或多个非最优控制器之间观察到的相互作用的日志集中学习动态随机系统的最优控制策略。它定义了一类特殊的强化学习(RL)算法，这是从交互中学习这种控制策略的一般范例。ADP解决了系统显示状态空间的问题，该状态空间太大而无法在计算机内存中枚举。正因为如此，近似格式被用来推广连续状态空间上的估计。然而，RL仍然缺乏对多维连续状态空间的可伸缩性。在本文中，我们提出使用局部加权投影回归(LWPR)方法来处理这种可扩展性问题。我们在两个经过修改以显示更大状态空间的标准基准上证明了我们的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 10th International Conference on Machine Learning and Applications and Workshops

自引率

0.00%

发文量

期刊最新文献

A Data-Mining Approach to Travel Price Forecasting L1 vs. L2 Regularization in Text Classification when Learning from Labeled Features Nonlinear RANSAC Optimization for Parameter Estimation with Applications to Phagocyte Transmigration Speech Rating System through Space Mapping Kernel Methods for Minimum Entropy Encoding