开放问题——遍历控制中相对值迭代的收敛性和渐近最优性

Q1 Mathematics Stochastic Systems Pub Date : 2019-09-17 DOI:10.1287/stsy.2019.0040

A. Arapostathis

引用次数: 2

摘要

马尔可夫决策过程(MDP)的相对值迭代方案(RVI)可以追溯到White(1963)，这是一项开创性的工作，该工作引入了一种求解遍历动态规划问题的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control

The relative value iteration scheme (RVI) for Markov decision processes (MDP) dates back to White (1963), a seminal work, which introduced an algorithm for solving the ergodic dynamic programming e...

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Stochastic Systems Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

3.70

自引率

0.00%

发文量