DYNAMIC PROGRAMMING ON RECURSIVE REWARD SYSTEMS

N. Furukawa, Seiichi Iwamoto
{"title":"DYNAMIC PROGRAMMING ON RECURSIVE REWARD SYSTEMS","authors":"N. Furukawa, Seiichi Iwamoto","doi":"10.5109/13108","DOIUrl":null,"url":null,"abstract":"Dynamic programming (DP) has been introduced by R. Bellman [2] as an important technique to solve non-linear programming problems in which a sequence of decisions has to be chosen in an optimal manner. Bellman, in his book, proposed \"Principle of Optimality\" to show that th e determination of an optimal policy can be reduced to the solution of an optimality equation, i. e., a functional equation that should be satisfied by an optimal return. Although Principle of Optimality is a proposition which needs mathematical reasoning, his justification for the principle was not in a precise mathematical form. For this reason, the scope of cost structure to which the principle is applicable has been left unexplained. Afterward, G. L. Nemhauser [9] gave a sufficient condition for the cost structure in order that an optimality equation holds true. His condition is that the cost function should have both a separability property and a monotonicity property. Nemhauser did not make explicit the relation between the effectiveness of Bellman's principle and the justification for an optimality equation — the relation is no more trivial under his condition. In this paper we shall be concerned with the optimization of finite-stage sequential decision processes. We shall give rigorous proofs for the justification of optimality equations and for the effectiveness of optimality principles with two meanings , without assuming the existence of maximum values of returns. Our condition is that the cost function should have a recursiveness property, a monotonicity property and a Lipschitz condition. Our recursiveness is essentially same as the separability in Nemhauser sense. Our monotonicity has two senses : one is a wide sense, and the other a strict sense. The monotonicity properties in the wide and the strict senses, together with the recursiveness and the Lipschitz condition, induce optimality principles in a weak and a strong senses, respectively. Bellman's Principle of Optimality is well to be identified, in our terms, a principle in the strong sense. Our principle in the weak sense has not been introduced in other literatures as far as the authors know. If we assume the existence of maximum values of returns like Nemhauser did, then the Lipschitz condition can be suppressed from hypotheses in our arguments. In this paper we treat both deterministic and stochastic cases. Section 2 is","PeriodicalId":287765,"journal":{"name":"Bulletin of Mathematical Statistics","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1976-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5109/13108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Dynamic programming (DP) has been introduced by R. Bellman [2] as an important technique to solve non-linear programming problems in which a sequence of decisions has to be chosen in an optimal manner. Bellman, in his book, proposed "Principle of Optimality" to show that th e determination of an optimal policy can be reduced to the solution of an optimality equation, i. e., a functional equation that should be satisfied by an optimal return. Although Principle of Optimality is a proposition which needs mathematical reasoning, his justification for the principle was not in a precise mathematical form. For this reason, the scope of cost structure to which the principle is applicable has been left unexplained. Afterward, G. L. Nemhauser [9] gave a sufficient condition for the cost structure in order that an optimality equation holds true. His condition is that the cost function should have both a separability property and a monotonicity property. Nemhauser did not make explicit the relation between the effectiveness of Bellman's principle and the justification for an optimality equation — the relation is no more trivial under his condition. In this paper we shall be concerned with the optimization of finite-stage sequential decision processes. We shall give rigorous proofs for the justification of optimality equations and for the effectiveness of optimality principles with two meanings , without assuming the existence of maximum values of returns. Our condition is that the cost function should have a recursiveness property, a monotonicity property and a Lipschitz condition. Our recursiveness is essentially same as the separability in Nemhauser sense. Our monotonicity has two senses : one is a wide sense, and the other a strict sense. The monotonicity properties in the wide and the strict senses, together with the recursiveness and the Lipschitz condition, induce optimality principles in a weak and a strong senses, respectively. Bellman's Principle of Optimality is well to be identified, in our terms, a principle in the strong sense. Our principle in the weak sense has not been introduced in other literatures as far as the authors know. If we assume the existence of maximum values of returns like Nemhauser did, then the Lipschitz condition can be suppressed from hypotheses in our arguments. In this paper we treat both deterministic and stochastic cases. Section 2 is
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
递归奖励系统的动态规划
动态规划(DP)由R. Bellman[2]引入,作为解决非线性规划问题的一种重要技术,在非线性规划问题中,必须以最优方式选择一系列决策。Bellman在他的书中提出了“最优性原理”,表明最优策略的确定可以归结为一个最优性方程的解,即一个应该被最优收益所满足的泛函方程。虽然最优性原理是一个需要数学推理的命题,但他对该原理的论证并不是精确的数学形式。因此,该原则适用的成本结构范围一直没有得到解释。随后,G. L. Nemhauser[9]给出了成本结构的充分条件,使最优性方程成立。他的条件是代价函数必须同时具有可分性和单调性。Nemhauser没有明确说明Bellman原理的有效性与最优性方程的证明之间的关系——在他的条件下,这种关系不再是微不足道的。本文主要研究有限阶段序列决策过程的优化问题。我们将在不假设收益最大值存在的情况下,对最优性方程的正当性和最优性原则的有效性给出具有两种意义的严格证明。我们的条件是代价函数必须具有递归性、单调性和Lipschitz条件。我们的递归性本质上和内姆豪瑟意义上的可分性是一样的。我们的单调有两种意义:一种是广义的单调,另一种是狭义的单调。广义和严格意义上的单调性,以及递归性和Lipschitz条件,分别推导出弱和强意义上的最优性原则。用我们的术语来说,Bellman的最优原则,是一个强意义上的原则。据作者所知,其他文献中还没有介绍过我们的弱意义原理。如果我们像Nemhauser那样假设存在收益最大值,那么Lipschitz条件就可以从我们论证中的假设中被抑制。在本文中,我们处理确定性和随机情况。第二部分是
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A STOCHASTIC APPROXIMATION WITH A SEQUENCE OF DEPENDENT RANDOM VARIABLES SILENT-NOISY DUEL WITH UNCERTAIN EXISTENCE OF THE SHOT ON CERTAIN APPROXIMATIONS OF POWER OF A TEST PROCEDURE USING TWO PRELIMINARY TESTS IN A MIXED MODEL WEAK PARETO OPTIMALITY OF MULTIOBJECTIVE PROBLEM IN A BANACH SPACE RANK TESTS OF PARTIAL CORRELATION
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1